diff options
author | Joseph K. Bradley <joseph@databricks.com> | 2015-02-25 16:13:17 -0800 |
---|---|---|
committer | Xiangrui Meng <meng@databricks.com> | 2015-02-25 16:13:17 -0800 |
commit | d20559b157743981b9c09e286f2aaff8cbefab59 (patch) | |
tree | 6d92015c1ae6b05c725860685351f86b8c4ed6af /docs/mllib-naive-bayes.md | |
parent | 46a044a36a2aff1306f7f677e952ce253ddbefac (diff) | |
download | spark-d20559b157743981b9c09e286f2aaff8cbefab59.tar.gz spark-d20559b157743981b9c09e286f2aaff8cbefab59.tar.bz2 spark-d20559b157743981b9c09e286f2aaff8cbefab59.zip |
[SPARK-5974] [SPARK-5980] [mllib] [python] [docs] Update ML guide with save/load, Python GBT
* Add GradientBoostedTrees Python examples to ML guide
* I ran these in the pyspark shell, and they worked.
* Add save/load to examples in ML guide
* Added note to python docs about predict,transform not working within RDD actions,transformations in some cases (See SPARK-5981)
CC: mengxr
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #4750 from jkbradley/SPARK-5974 and squashes the following commits:
c410e38 [Joseph K. Bradley] Added note to LabeledPoint about attributes
bcae18b [Joseph K. Bradley] Added import of models for save/load examples in ml guide. Fixed line length for tree.py, feature.py (but not other ML Pyspark files yet).
6d81c3e [Joseph K. Bradley] completed python GBT examples
9903309 [Joseph K. Bradley] Added note to python docs about predict,transform not working within RDD actions,transformations in some cases
c7dfad8 [Joseph K. Bradley] Added model save/load to ML guide. Added GBT examples to ML guide
Diffstat (limited to 'docs/mllib-naive-bayes.md')
-rw-r--r-- | docs/mllib-naive-bayes.md | 10 |
1 files changed, 9 insertions, 1 deletions
diff --git a/docs/mllib-naive-bayes.md b/docs/mllib-naive-bayes.md index d5b044d94f..81173255b5 100644 --- a/docs/mllib-naive-bayes.md +++ b/docs/mllib-naive-bayes.md @@ -37,7 +37,7 @@ smoothing parameter `lambda` as input, and output a can be used for evaluation and prediction. {% highlight scala %} -import org.apache.spark.mllib.classification.NaiveBayes +import org.apache.spark.mllib.classification.{NaiveBayes, NaiveBayesModel} import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.mllib.regression.LabeledPoint @@ -55,6 +55,9 @@ val model = NaiveBayes.train(training, lambda = 1.0) val predictionAndLabel = test.map(p => (model.predict(p.features), p.label)) val accuracy = 1.0 * predictionAndLabel.filter(x => x._1 == x._2).count() / test.count() + +model.save("myModelPath") +val sameModel = NaiveBayesModel.load("myModelPath") {% endhighlight %} </div> @@ -93,6 +96,9 @@ double accuracy = predictionAndLabel.filter(new Function<Tuple2<Double, Double>, return pl._1().equals(pl._2()); } }).count() / (double) test.count(); + +model.save("myModelPath"); +NaiveBayesModel sameModel = NaiveBayesModel.load("myModelPath"); {% endhighlight %} </div> @@ -105,6 +111,8 @@ smoothing parameter `lambda` as input, and output a [NaiveBayesModel](api/python/pyspark.mllib.classification.NaiveBayesModel-class.html), which can be used for evaluation and prediction. +Note that the Python API does not yet support model save/load but will in the future. + <!-- TODO: Make Python's example consistent with Scala's and Java's. --> {% highlight python %} from pyspark.mllib.regression import LabeledPoint |