diff options
author | Josh Rosen <joshrosen@eecs.berkeley.edu> | 2012-10-13 14:57:33 -0700 |
---|---|---|
committer | Josh Rosen <joshrosen@eecs.berkeley.edu> | 2012-10-13 14:59:20 -0700 |
commit | 33cd3a0c12bf487a9060135c6cf2a3efa7943c77 (patch) | |
tree | f99125fea55f30258d44fdd81921765865e95f68 /run2.cmd | |
parent | 10bcd217d2c9fcd7822d4399cfb9a0c9a05bc56e (diff) | |
download | spark-33cd3a0c12bf487a9060135c6cf2a3efa7943c77.tar.gz spark-33cd3a0c12bf487a9060135c6cf2a3efa7943c77.tar.bz2 spark-33cd3a0c12bf487a9060135c6cf2a3efa7943c77.zip |
Remove map-side combining from ShuffleMapTask.
This separation of concerns simplifies the
ShuffleDependency and ShuffledRDD interfaces.
Map-side combining can be performed in a
mapPartitions() call prior to shuffling the RDD.
I don't anticipate this having much of a
performance impact: in both approaches, each tuple
is hashed twice: once in the bucket partitioning
and once in the combiner's hashtable. The same
steps are being performed, but in a different
order and through one extra Iterator.
Diffstat (limited to 'run2.cmd')
0 files changed, 0 insertions, 0 deletions