aboutsummaryrefslogtreecommitdiff
path: root/docs/mllib-dimensionality-reduction.md
diff options
context:
space:
mode:
authorTor Myklebust <tmyklebu@gmail.com>2014-04-22 11:07:30 -0700
committerPatrick Wendell <pwendell@gmail.com>2014-04-22 11:07:30 -0700
commitbf9d49b6d1f668b49795c2d380ab7d64ec0029da (patch)
treec6f8d4424eb8055cb319cf9e700ca9c6ddf49308 /docs/mllib-dimensionality-reduction.md
parentc919798f0912dc03c8365b9a384d9ee6d5b25c51 (diff)
downloadspark-bf9d49b6d1f668b49795c2d380ab7d64ec0029da.tar.gz
spark-bf9d49b6d1f668b49795c2d380ab7d64ec0029da.tar.bz2
spark-bf9d49b6d1f668b49795c2d380ab7d64ec0029da.zip
[SPARK-1281] Improve partitioning in ALS
ALS was using HashPartitioner and explicit uses of `%` together. Further, the naked use of `%` meant that, if the number of partitions corresponded with the stride of arithmetic progressions appearing in user and product ids, users and products could be mapped into buckets in an unfair or unwise way. This pull request: 1) Makes the Partitioner an instance variable of ALS. 2) Replaces the direct uses of `%` with calls to a Partitioner. 3) Defines an anonymous Partitioner that scrambles the bits of the object's hashCode before reducing to the number of present buckets. This pull request does not make the partitioner user-configurable. I'm not all that happy about the way I did (1). It introduces an icky lifetime issue and dances around it by nulling something. However, I don't know a better way to make the partitioner visible everywhere it needs to be visible. Author: Tor Myklebust <tmyklebu@gmail.com> Closes #407 from tmyklebu/master and squashes the following commits: dcf583a [Tor Myklebust] Remove the partitioner member variable; instead, thread that needle everywhere it needs to go. 23d6f91 [Tor Myklebust] Stop making the partitioner configurable. 495784f [Tor Myklebust] Merge branch 'master' of https://github.com/apache/spark 674933a [Tor Myklebust] Fix style. 40edc23 [Tor Myklebust] Fix missing space. f841345 [Tor Myklebust] Fix daft bug creating 'pairs', also for -> foreach. 5ec9e6c [Tor Myklebust] Clean a couple of things up using 'map'. 36a0f43 [Tor Myklebust] Make the partitioner private. d872b09 [Tor Myklebust] Add negative id ALS test. df27697 [Tor Myklebust] Support custom partitioners. Currently we use the same partitioner for users and products. c90b6d8 [Tor Myklebust] Scramble user and product ids before bucketing. c774d7d [Tor Myklebust] Make the partitioner a member variable and use it instead of modding directly.
Diffstat (limited to 'docs/mllib-dimensionality-reduction.md')
0 files changed, 0 insertions, 0 deletions