[SPARK-1266] persist factors in implicit ALS - spark

diff options

author	Xiangrui Meng <meng@databricks.com>	2014-03-18 17:20:42 -0700
committer	Matei Zaharia <matei@databricks.com>	2014-03-18 17:20:42 -0700
commit	f9d8a83c0006bb59c61e8770cd201b72333cb9a4 (patch)
tree	31690aca1930996c7e9955311928228fb14541e5 /core
parent	e108b9ab94c4310ec56ef0eda99bb904133f942d (diff)
download	spark-f9d8a83c0006bb59c61e8770cd201b72333cb9a4.tar.gz spark-f9d8a83c0006bb59c61e8770cd201b72333cb9a4.tar.bz2 spark-f9d8a83c0006bb59c61e8770cd201b72333cb9a4.zip

[SPARK-1266] persist factors in implicit ALS

In implicit ALS computation, the user or product factor is used twice in each iteration. Caching can certainly help accelerate the computation. I saw the running time decreased by ~70% for implicit ALS on the movielens data. I also made the following changes: 1. Change `YtYb` type from `Broadcast[Option[DoubleMatrix]]` to `Option[Broadcast[DoubleMatrix]]`, so we don't need to broadcast None in explicit computation. 2. Mark methods `computeYtY`, `unblockFactors`, `updateBlock`, and `updateFeatures private`. Users do not need those methods. 3. Materialize the final matrix factors before returning the model. It allows us to clean up other cached RDDs before returning the model. I do not have a better solution here, so I use `RDD.count()`. JIRA: https://spark-project.atlassian.net/browse/SPARK-1266 Author: Xiangrui Meng <meng@databricks.com> Closes #165 from mengxr/als and squashes the following commits: c9676a6 [Xiangrui Meng] add a comment about the last products.persist d3a88aa [Xiangrui Meng] change implicitPrefs match to if ... else ... 63862d6 [Xiangrui Meng] persist factors in implicit ALS

Diffstat (limited to 'core')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: