diff options
author | Xiangrui Meng <meng@databricks.com> | 2015-03-01 16:26:57 -0800 |
---|---|---|
committer | Xiangrui Meng <meng@databricks.com> | 2015-03-01 16:26:57 -0800 |
commit | aedbbaa3dda9cbc154cd52c07f6d296b972b0eb2 (patch) | |
tree | 4ba785e145c21b93e1e4c49ae33899642b1f3cea /docs | |
parent | fd8d283eeb98e310b1e85ef8c3a8af9e547ab5e0 (diff) | |
download | spark-aedbbaa3dda9cbc154cd52c07f6d296b972b0eb2.tar.gz spark-aedbbaa3dda9cbc154cd52c07f6d296b972b0eb2.tar.bz2 spark-aedbbaa3dda9cbc154cd52c07f6d296b972b0eb2.zip |
[SPARK-6053][MLLIB] support save/load in PySpark's ALS
A simple wrapper to save/load `MatrixFactorizationModel` in Python. jkbradley
Author: Xiangrui Meng <meng@databricks.com>
Closes #4811 from mengxr/SPARK-5991 and squashes the following commits:
f135dac [Xiangrui Meng] update save doc
57e5200 [Xiangrui Meng] address comments
06140a4 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-5991
282ec8d [Xiangrui Meng] support save/load in PySpark's ALS
Diffstat (limited to 'docs')
-rw-r--r-- | docs/mllib-collaborative-filtering.md | 8 |
1 files changed, 5 insertions, 3 deletions
diff --git a/docs/mllib-collaborative-filtering.md b/docs/mllib-collaborative-filtering.md index 27aa4d38b7..76140282a2 100644 --- a/docs/mllib-collaborative-filtering.md +++ b/docs/mllib-collaborative-filtering.md @@ -200,10 +200,8 @@ In the following example we load rating data. Each row consists of a user, a pro We use the default ALS.train() method which assumes ratings are explicit. We evaluate the recommendation by measuring the Mean Squared Error of rating prediction. -Note that the Python API does not yet support model save/load but will in the future. - {% highlight python %} -from pyspark.mllib.recommendation import ALS, Rating +from pyspark.mllib.recommendation import ALS, MatrixFactorizationModel, Rating # Load and parse the data data = sc.textFile("data/mllib/als/test.data") @@ -220,6 +218,10 @@ predictions = model.predictAll(testdata).map(lambda r: ((r[0], r[1]), r[2])) ratesAndPreds = ratings.map(lambda r: ((r[0], r[1]), r[2])).join(predictions) MSE = ratesAndPreds.map(lambda r: (r[1][0] - r[1][1])**2).reduce(lambda x, y: x + y) / ratesAndPreds.count() print("Mean Squared Error = " + str(MSE)) + +# Save and load model +model.save(sc, "myModelPath") +sameModel = MatrixFactorizationModel.load(sc, "myModelPath") {% endhighlight %} If the rating matrix is derived from other source of information (i.e., it is inferred from other |