aboutsummaryrefslogtreecommitdiff
path: root/mllib
diff options
context:
space:
mode:
authorTor Myklebust <tmyklebu@gmail.com>2014-04-29 22:04:34 -0700
committerReynold Xin <rxin@apache.org>2014-04-29 22:04:34 -0700
commit5c0cd5c1a594c181a3f7536639122ab7d97b271b (patch)
tree47f7ccf36d7a8425497e2ecd4a2a10134510482b /mllib
parentd33df1c151f8e982edd7324edc06d8cd3024dd34 (diff)
downloadspark-5c0cd5c1a594c181a3f7536639122ab7d97b271b.tar.gz
spark-5c0cd5c1a594c181a3f7536639122ab7d97b271b.tar.bz2
spark-5c0cd5c1a594c181a3f7536639122ab7d97b271b.zip
[SPARK-1646] Micro-optimisation of ALS
This change replaces some Scala `for` and `foreach` constructs with `while` constructs. There may be a slight performance gain on the order of 1-2% when training an ALS model. I trained an ALS model on the Movielens 10M-rating dataset repeatedly both with and without these changes. All 7 runs in both columns were done in a Scala `for` loop like this: for (iter <- 0 to 10) { val before = System.currentTimeMillis() val model = ALS.train(rats, 20, 10) val after = System.currentTimeMillis() println("%d ms".format(after-before)) println("rmse %g".format(computeRmse(model, rats, numRatings))) } The timings were done on a multiuser machine, and I stopped one set of timings after 7 had been completed. It would be nice if somebody with dedicated hardware could confirm my timings. After Before 121980 ms 122041 ms 117069 ms 117127 ms 115332 ms 117523 ms 115381 ms 117402 ms 114635 ms 116550 ms 114140 ms 114076 ms 112993 ms 117200 ms Ratios are about 1.0005, 1.0005, 1.019, 1.0175, 1.01671, 0.99944, and 1.03723. I therefore suspect these changes make for a slight performance gain on the order of 1-2%. Author: Tor Myklebust <tmyklebu@gmail.com> Closes #568 from tmyklebu/alsopt and squashes the following commits: 5ded80f [Tor Myklebust] Fix style. 79595ff [Tor Myklebust] Fix style error. 4ef0313 [Tor Myklebust] Merge branch 'master' of github.com:apache/spark into alsopt 114fb74 [Tor Myklebust] Turn some 'for' loops into 'while' loops. dcf583a [Tor Myklebust] Remove the partitioner member variable; instead, thread that needle everywhere it needs to go. 23d6f91 [Tor Myklebust] Stop making the partitioner configurable. 495784f [Tor Myklebust] Merge branch 'master' of https://github.com/apache/spark 674933a [Tor Myklebust] Fix style. 40edc23 [Tor Myklebust] Fix missing space. f841345 [Tor Myklebust] Fix daft bug creating 'pairs', also for -> foreach. 5ec9e6c [Tor Myklebust] Clean a couple of things up using 'map'. 36a0f43 [Tor Myklebust] Make the partitioner private. d872b09 [Tor Myklebust] Add negative id ALS test. df27697 [Tor Myklebust] Support custom partitioners. Currently we use the same partitioner for users and products. c90b6d8 [Tor Myklebust] Scramble user and product ids before bucketing. c774d7d [Tor Myklebust] Make the partitioner a member variable and use it instead of modding directly.
Diffstat (limited to 'mllib')
-rw-r--r--mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala22
1 files changed, 17 insertions, 5 deletions
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
index 2a77e1a9ef..0cf9a7f909 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
@@ -472,13 +472,15 @@ class ALS private (
// Compute the XtX and Xy values for each user by adding products it rated in each product
// block
for (productBlock <- 0 until numBlocks) {
- for (p <- 0 until blockFactors(productBlock).length) {
+ var p = 0
+ while (p < blockFactors(productBlock).length) {
val x = wrapDoubleArray(blockFactors(productBlock)(p))
tempXtX.fill(0.0)
dspr(1.0, x, tempXtX)
val (us, rs) = inLinkBlock.ratingsForBlock(productBlock)(p)
- for (i <- 0 until us.length) {
- if (implicitPrefs) {
+ if (implicitPrefs) {
+ var i = 0
+ while (i < us.length) {
// Extension to the original paper to handle rs(i) < 0. confidence is a function
// of |rs(i)| instead so that it is never negative:
val confidence = 1 + alpha * abs(rs(i))
@@ -489,11 +491,17 @@ class ALS private (
if (rs(i) > 0) {
SimpleBlas.axpy(confidence, x, userXy(us(i)))
}
- } else {
+ i += 1
+ }
+ } else {
+ var i = 0
+ while (i < us.length) {
userXtX(us(i)).addi(tempXtX)
SimpleBlas.axpy(rs(i), x, userXy(us(i)))
+ i += 1
}
}
+ p += 1
}
}
@@ -502,7 +510,11 @@ class ALS private (
// Compute the full XtX matrix from the lower-triangular part we got above
fillFullMatrix(userXtX(index), fullXtX)
// Add regularization
- (0 until rank).foreach(i => fullXtX.data(i*rank + i) += lambda)
+ var i = 0
+ while (i < rank) {
+ fullXtX.data(i * rank + i) += lambda
+ i += 1
+ }
// Solve the resulting matrix, which is symmetric and positive-definite
if (implicitPrefs) {
Solve.solvePositive(fullXtX.addi(YtY.get.value), userXy(index)).data