[SPARK-5974] [SPARK-5980] [mllib] [python] [docs] Update ML guide with save/load, Python GBT

* Add GradientBoostedTrees Python examples to ML guide * I ran these in the pyspark shell, and they worked. * Add save/load to examples in ML guide * Added note to python docs about predict,transform not working within RDD actions,transformations in some cases (See SPARK-5981) CC: mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #4750 from jkbradley/SPARK-5974 and squashes the following commits: c410e38 [Joseph K. Bradley] Added note to LabeledPoint about attributes bcae18b [Joseph K. Bradley] Added import of models for save/load examples in ml guide. Fixed line length for tree.py, feature.py (but not other ML Pyspark files yet). 6d81c3e [Joseph K. Bradley] completed python GBT examples 9903309 [Joseph K. Bradley] Added note to python docs about predict,transform not working within RDD actions,transformations in some cases c7dfad8 [Joseph K. Bradley] Added model save/load to ML guide. Added GBT examples to ML guide
author: Joseph K. Bradley <joseph@databricks.com> 2015-02-25 16:13:17 -0800
committer: Xiangrui Meng <meng@databricks.com> 2015-02-25 16:13:17 -0800
commit: d20559b157743981b9c09e286f2aaff8cbefab59 (patch)
tree: 6d92015c1ae6b05c725860685351f86b8c4ed6af /docs/mllib-linear-methods.md
parent: 46a044a36a2aff1306f7f677e952ce253ddbefac (diff)
download: spark-d20559b157743981b9c09e286f2aaff8cbefab59.tar.gz
spark-d20559b157743981b9c09e286f2aaff8cbefab59.tar.bz2
spark-d20559b157743981b9c09e286f2aaff8cbefab59.zip
1 files changed, 19 insertions, 2 deletions
diff --git a/docs/mllib-linear-methods.md b/docs/mllib-linear-methods.md
index 44b7f67c57..d9fc63b37d 100644
--- a/docs/mllib-linear-methods.md
+++ b/docs/mllib-linear-methods.md
@@ -190,7 +190,7 @@ error.
 
 {% highlight scala %}
 import org.apache.spark.SparkContext
-import org.apache.spark.mllib.classification.SVMWithSGD
+import org.apache.spark.mllib.classification.{SVMModel, SVMWithSGD}
 import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
 import org.apache.spark.mllib.regression.LabeledPoint
 import org.apache.spark.mllib.linalg.Vectors
@@ -222,6 +222,9 @@ val metrics = new BinaryClassificationMetrics(scoreAndLabels)
 val auROC = metrics.areaUnderROC()
 
 println("Area under ROC = " + auROC)
+
+model.save("myModelPath")
+val sameModel = SVMModel.load("myModelPath")
 {% endhighlight %}
 
 The `SVMWithSGD.train()` method by default performs L2 regularization with the
@@ -304,6 +307,9 @@ public class SVMClassifier {
     double auROC = metrics.areaUnderROC();
     
     System.out.println("Area under ROC = " + auROC);
+
+    model.save("myModelPath");
+    SVMModel sameModel = SVMModel.load("myModelPath");
   }
 }
 {% endhighlight %}
@@ -338,6 +344,8 @@ a dependency.
 The following example shows how to load a sample dataset, build Logistic Regression model,
 and make predictions with the resulting model to compute the training error.
 
+Note that the Python API does not yet support model save/load but will in the future.
+
 {% highlight python %}
 from pyspark.mllib.classification import LogisticRegressionWithSGD
 from pyspark.mllib.regression import LabeledPoint
@@ -391,8 +399,9 @@ values. We compute the mean squared error at the end to evaluate
 [goodness of fit](http://en.wikipedia.org/wiki/Goodness_of_fit).
 
 {% highlight scala %}
-import org.apache.spark.mllib.regression.LinearRegressionWithSGD
 import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.mllib.regression.LinearRegressionModel
+import org.apache.spark.mllib.regression.LinearRegressionWithSGD
 import org.apache.spark.mllib.linalg.Vectors
 
 // Load and parse the data
@@ -413,6 +422,9 @@ val valuesAndPreds = parsedData.map { point =>
 }
 val MSE = valuesAndPreds.map{case(v, p) => math.pow((v - p), 2)}.mean()
 println("training Mean Squared Error = " + MSE)
+
+model.save("myModelPath")
+val sameModel = LinearRegressionModel.load("myModelPath")
 {% endhighlight %}
 
 [`RidgeRegressionWithSGD`](api/scala/index.html#org.apache.spark.mllib.regression.RidgeRegressionWithSGD)
@@ -483,6 +495,9 @@ public class LinearRegression {
       }
     ).rdd()).mean();
     System.out.println("training Mean Squared Error = " + MSE);
+
+    model.save("myModelPath");
+    LinearRegressionModel sameModel = LinearRegressionModel.load("myModelPath");
   }
 }
 {% endhighlight %}
@@ -494,6 +509,8 @@ The example then uses LinearRegressionWithSGD to build a simple linear model to
 values. We compute the mean squared error at the end to evaluate
 [goodness of fit](http://en.wikipedia.org/wiki/Goodness_of_fit).
 
+Note that the Python API does not yet support model save/load but will in the future.
+
 {% highlight python %}
 from pyspark.mllib.regression import LabeledPoint, LinearRegressionWithSGD
 from numpy import array
author	Joseph K. Bradley <joseph@databricks.com>	2015-02-25 16:13:17 -0800
committer	Xiangrui Meng <meng@databricks.com>	2015-02-25 16:13:17 -0800
commit	d20559b157743981b9c09e286f2aaff8cbefab59 (patch)
tree	6d92015c1ae6b05c725860685351f86b8c4ed6af /docs/mllib-linear-methods.md
parent	46a044a36a2aff1306f7f677e952ce253ddbefac (diff)
download	spark-d20559b157743981b9c09e286f2aaff8cbefab59.tar.gz spark-d20559b157743981b9c09e286f2aaff8cbefab59.tar.bz2 spark-d20559b157743981b9c09e286f2aaff8cbefab59.zip