diff options
author | Tor Myklebust <tmyklebu@gmail.com> | 2014-04-29 22:04:34 -0700 |
---|---|---|
committer | Reynold Xin <rxin@apache.org> | 2014-04-29 22:04:34 -0700 |
commit | 5c0cd5c1a594c181a3f7536639122ab7d97b271b (patch) | |
tree | 47f7ccf36d7a8425497e2ecd4a2a10134510482b /core | |
parent | d33df1c151f8e982edd7324edc06d8cd3024dd34 (diff) | |
download | spark-5c0cd5c1a594c181a3f7536639122ab7d97b271b.tar.gz spark-5c0cd5c1a594c181a3f7536639122ab7d97b271b.tar.bz2 spark-5c0cd5c1a594c181a3f7536639122ab7d97b271b.zip |
[SPARK-1646] Micro-optimisation of ALS
This change replaces some Scala `for` and `foreach` constructs with `while` constructs. There may be a slight performance gain on the order of 1-2% when training an ALS model.
I trained an ALS model on the Movielens 10M-rating dataset repeatedly both with and without these changes. All 7 runs in both columns were done in a Scala `for` loop like this:
for (iter <- 0 to 10) {
val before = System.currentTimeMillis()
val model = ALS.train(rats, 20, 10)
val after = System.currentTimeMillis()
println("%d ms".format(after-before))
println("rmse %g".format(computeRmse(model, rats, numRatings)))
}
The timings were done on a multiuser machine, and I stopped one set of timings after 7 had been completed. It would be nice if somebody with dedicated hardware could confirm my timings.
After Before
121980 ms 122041 ms
117069 ms 117127 ms
115332 ms 117523 ms
115381 ms 117402 ms
114635 ms 116550 ms
114140 ms 114076 ms
112993 ms 117200 ms
Ratios are about 1.0005, 1.0005, 1.019, 1.0175, 1.01671, 0.99944, and 1.03723. I therefore suspect these changes make for a slight performance gain on the order of 1-2%.
Author: Tor Myklebust <tmyklebu@gmail.com>
Closes #568 from tmyklebu/alsopt and squashes the following commits:
5ded80f [Tor Myklebust] Fix style.
79595ff [Tor Myklebust] Fix style error.
4ef0313 [Tor Myklebust] Merge branch 'master' of github.com:apache/spark into alsopt
114fb74 [Tor Myklebust] Turn some 'for' loops into 'while' loops.
dcf583a [Tor Myklebust] Remove the partitioner member variable; instead, thread that needle everywhere it needs to go.
23d6f91 [Tor Myklebust] Stop making the partitioner configurable.
495784f [Tor Myklebust] Merge branch 'master' of https://github.com/apache/spark
674933a [Tor Myklebust] Fix style.
40edc23 [Tor Myklebust] Fix missing space.
f841345 [Tor Myklebust] Fix daft bug creating 'pairs', also for -> foreach.
5ec9e6c [Tor Myklebust] Clean a couple of things up using 'map'.
36a0f43 [Tor Myklebust] Make the partitioner private.
d872b09 [Tor Myklebust] Add negative id ALS test.
df27697 [Tor Myklebust] Support custom partitioners. Currently we use the same partitioner for users and products.
c90b6d8 [Tor Myklebust] Scramble user and product ids before bucketing.
c774d7d [Tor Myklebust] Make the partitioner a member variable and use it instead of modding directly.
Diffstat (limited to 'core')
0 files changed, 0 insertions, 0 deletions