aboutsummaryrefslogtreecommitdiff
path: root/docs/mllib-decision-tree.md
diff options
context:
space:
mode:
authorXiangrui Meng <meng@databricks.com>2015-03-02 22:27:01 -0800
committerXiangrui Meng <meng@databricks.com>2015-03-02 22:27:01 -0800
commit7e53a79c30511dbd0e5d9878a4b8b0f5bc94e68b (patch)
tree4fc615db1b5144cf7b430ea3bc26bda2cd49cad8 /docs/mllib-decision-tree.md
parent54d19689ff8d786acde5b8ada6741854ffadadea (diff)
downloadspark-7e53a79c30511dbd0e5d9878a4b8b0f5bc94e68b.tar.gz
spark-7e53a79c30511dbd0e5d9878a4b8b0f5bc94e68b.tar.bz2
spark-7e53a79c30511dbd0e5d9878a4b8b0f5bc94e68b.zip
[SPARK-6097][MLLIB] Support tree model save/load in PySpark/MLlib
Similar to `MatrixFactorizaionModel`, we only need wrappers to support save/load for tree models in Python. jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #4854 from mengxr/SPARK-6097 and squashes the following commits: 4586a4d [Xiangrui Meng] fix more typos 8ebcac2 [Xiangrui Meng] fix python style 91172d8 [Xiangrui Meng] fix typos 201b3b9 [Xiangrui Meng] update user guide b5158e2 [Xiangrui Meng] support tree model save/load in PySpark/MLlib
Diffstat (limited to 'docs/mllib-decision-tree.md')
-rw-r--r--docs/mllib-decision-tree.md16
1 files changed, 10 insertions, 6 deletions
diff --git a/docs/mllib-decision-tree.md b/docs/mllib-decision-tree.md
index 8e478ab035..c1d0f8a6b1 100644
--- a/docs/mllib-decision-tree.md
+++ b/docs/mllib-decision-tree.md
@@ -293,11 +293,9 @@ DecisionTreeModel sameModel = DecisionTreeModel.load(sc.sc(), "myModelPath");
<div data-lang="python">
-Note that the Python API does not yet support model save/load but will in the future.
-
{% highlight python %}
from pyspark.mllib.regression import LabeledPoint
-from pyspark.mllib.tree import DecisionTree
+from pyspark.mllib.tree import DecisionTree, DecisionTreeModel
from pyspark.mllib.util import MLUtils
# Load and parse the data file into an RDD of LabeledPoint.
@@ -317,6 +315,10 @@ testErr = labelsAndPredictions.filter(lambda (v, p): v != p).count() / float(tes
print('Test Error = ' + str(testErr))
print('Learned classification tree model:')
print(model.toDebugString())
+
+# Save and load model
+model.save(sc, "myModelPath")
+sameModel = DecisionTreeModel.load(sc, "myModelPath")
{% endhighlight %}
</div>
@@ -440,11 +442,9 @@ DecisionTreeModel sameModel = DecisionTreeModel.load(sc.sc(), "myModelPath");
<div data-lang="python">
-Note that the Python API does not yet support model save/load but will in the future.
-
{% highlight python %}
from pyspark.mllib.regression import LabeledPoint
-from pyspark.mllib.tree import DecisionTree
+from pyspark.mllib.tree import DecisionTree, DecisionTreeModel
from pyspark.mllib.util import MLUtils
# Load and parse the data file into an RDD of LabeledPoint.
@@ -464,6 +464,10 @@ testMSE = labelsAndPredictions.map(lambda (v, p): (v - p) * (v - p)).sum() / flo
print('Test Mean Squared Error = ' + str(testMSE))
print('Learned regression tree model:')
print(model.toDebugString())
+
+# Save and load model
+model.save(sc, "myModelPath")
+sameModel = DecisionTreeModel.load(sc, "myModelPath")
{% endhighlight %}
</div>