aboutsummaryrefslogtreecommitdiff
path: root/run2.cmd
diff options
context:
space:
mode:
authorJosh Rosen <joshrosen@eecs.berkeley.edu>2012-10-13 14:57:33 -0700
committerJosh Rosen <joshrosen@eecs.berkeley.edu>2012-10-13 14:59:20 -0700
commit33cd3a0c12bf487a9060135c6cf2a3efa7943c77 (patch)
treef99125fea55f30258d44fdd81921765865e95f68 /run2.cmd
parent10bcd217d2c9fcd7822d4399cfb9a0c9a05bc56e (diff)
downloadspark-33cd3a0c12bf487a9060135c6cf2a3efa7943c77.tar.gz
spark-33cd3a0c12bf487a9060135c6cf2a3efa7943c77.tar.bz2
spark-33cd3a0c12bf487a9060135c6cf2a3efa7943c77.zip
Remove map-side combining from ShuffleMapTask.
This separation of concerns simplifies the ShuffleDependency and ShuffledRDD interfaces. Map-side combining can be performed in a mapPartitions() call prior to shuffling the RDD. I don't anticipate this having much of a performance impact: in both approaches, each tuple is hashed twice: once in the bucket partitioning and once in the combiner's hashtable. The same steps are being performed, but in a different order and through one extra Iterator.
Diffstat (limited to 'run2.cmd')
0 files changed, 0 insertions, 0 deletions