aboutsummaryrefslogtreecommitdiff
path: root/core
diff options
context:
space:
mode:
authorXiangrui Meng <meng@databricks.com>2014-03-18 17:20:42 -0700
committerMatei Zaharia <matei@databricks.com>2014-03-18 17:20:42 -0700
commitf9d8a83c0006bb59c61e8770cd201b72333cb9a4 (patch)
tree31690aca1930996c7e9955311928228fb14541e5 /core
parente108b9ab94c4310ec56ef0eda99bb904133f942d (diff)
downloadspark-f9d8a83c0006bb59c61e8770cd201b72333cb9a4.tar.gz
spark-f9d8a83c0006bb59c61e8770cd201b72333cb9a4.tar.bz2
spark-f9d8a83c0006bb59c61e8770cd201b72333cb9a4.zip
[SPARK-1266] persist factors in implicit ALS
In implicit ALS computation, the user or product factor is used twice in each iteration. Caching can certainly help accelerate the computation. I saw the running time decreased by ~70% for implicit ALS on the movielens data. I also made the following changes: 1. Change `YtYb` type from `Broadcast[Option[DoubleMatrix]]` to `Option[Broadcast[DoubleMatrix]]`, so we don't need to broadcast None in explicit computation. 2. Mark methods `computeYtY`, `unblockFactors`, `updateBlock`, and `updateFeatures private`. Users do not need those methods. 3. Materialize the final matrix factors before returning the model. It allows us to clean up other cached RDDs before returning the model. I do not have a better solution here, so I use `RDD.count()`. JIRA: https://spark-project.atlassian.net/browse/SPARK-1266 Author: Xiangrui Meng <meng@databricks.com> Closes #165 from mengxr/als and squashes the following commits: c9676a6 [Xiangrui Meng] add a comment about the last products.persist d3a88aa [Xiangrui Meng] change implicitPrefs match to if ... else ... 63862d6 [Xiangrui Meng] persist factors in implicit ALS
Diffstat (limited to 'core')
0 files changed, 0 insertions, 0 deletions