diff options
author | Xiangrui Meng <meng@databricks.com> | 2014-03-18 17:20:42 -0700 |
---|---|---|
committer | Matei Zaharia <matei@databricks.com> | 2014-03-18 17:20:42 -0700 |
commit | f9d8a83c0006bb59c61e8770cd201b72333cb9a4 (patch) | |
tree | 31690aca1930996c7e9955311928228fb14541e5 /core | |
parent | e108b9ab94c4310ec56ef0eda99bb904133f942d (diff) | |
download | spark-f9d8a83c0006bb59c61e8770cd201b72333cb9a4.tar.gz spark-f9d8a83c0006bb59c61e8770cd201b72333cb9a4.tar.bz2 spark-f9d8a83c0006bb59c61e8770cd201b72333cb9a4.zip |
[SPARK-1266] persist factors in implicit ALS
In implicit ALS computation, the user or product factor is used twice in each iteration. Caching can certainly help accelerate the computation. I saw the running time decreased by ~70% for implicit ALS on the movielens data.
I also made the following changes:
1. Change `YtYb` type from `Broadcast[Option[DoubleMatrix]]` to `Option[Broadcast[DoubleMatrix]]`, so we don't need to broadcast None in explicit computation.
2. Mark methods `computeYtY`, `unblockFactors`, `updateBlock`, and `updateFeatures private`. Users do not need those methods.
3. Materialize the final matrix factors before returning the model. It allows us to clean up other cached RDDs before returning the model. I do not have a better solution here, so I use `RDD.count()`.
JIRA: https://spark-project.atlassian.net/browse/SPARK-1266
Author: Xiangrui Meng <meng@databricks.com>
Closes #165 from mengxr/als and squashes the following commits:
c9676a6 [Xiangrui Meng] add a comment about the last products.persist
d3a88aa [Xiangrui Meng] change implicitPrefs match to if ... else ...
63862d6 [Xiangrui Meng] persist factors in implicit ALS
Diffstat (limited to 'core')
0 files changed, 0 insertions, 0 deletions