aboutsummaryrefslogtreecommitdiff
path: root/project/SparkBuild.scala
diff options
context:
space:
mode:
authorMatei Zaharia <matei@eecs.berkeley.edu>2013-10-18 20:30:56 -0700
committerMatei Zaharia <matei@eecs.berkeley.edu>2013-10-18 20:30:56 -0700
commite5316d0685c41a40e54a064cf271f3d62df6c8e8 (patch)
tree21b3bab4eb604e41ad2b7ecf93b8ee9a0b60c298 /project/SparkBuild.scala
parent8d528af829dc989d4701c08fd90d230c15df7f7e (diff)
parent08391dbcb8f28781382a362359d18f71ae37745b (diff)
downloadspark-e5316d0685c41a40e54a064cf271f3d62df6c8e8.tar.gz
spark-e5316d0685c41a40e54a064cf271f3d62df6c8e8.tar.bz2
spark-e5316d0685c41a40e54a064cf271f3d62df6c8e8.zip
Merge pull request #68 from mosharaf/master
Faster and stable/reliable broadcast HttpBroadcast is noticeably slow, but the alternatives (TreeBroadcast or BitTorrentBroadcast) are notoriously unreliable. The main problem with them is they try to manage the memory for the pieces of a broadcast themselves. Right now, the BroadcastManager does not know which machines the tasks reading from a broadcast variable is running and when they have finished. Consequently, we try to guess and often guess wrong, which blows up the memory usage and kills/hangs jobs. This very simple implementation solves the problem by not trying to manage the intermediate pieces; instead, it offloads that duty to the BlockManager which is quite good at juggling blocks. Otherwise, it is very similar to the BitTorrentBroadcast implementation (without fancy optimizations). And it runs much faster than HttpBroadcast we have right now. I've been using this for another project for last couple of weeks, and just today did some benchmarking against the Http one. The following shows the improvements for increasing broadcast size for cold runs. Each line represent the number of receivers. ![fix-bc-first](https://f.cloud.github.com/assets/232966/1349342/ffa149e4-36e7-11e3-9fa6-c74555829356.png) After the first broadcast is over, i.e., after JVM is wormed up and for HttpBroadcast the server is already running (I think), the following are the improvements for warm runs. ![fix-bc-succ](https://f.cloud.github.com/assets/232966/1349352/5a948bae-36e8-11e3-98ce-34f19ebd33e0.jpg) The curves are not as nice as the cold runs, but the improvements are obvious, specially for larger broadcasts and more receivers. Depending on how it goes, we should deprecate and/or remove old TreeBroadcast and BitTorrentBroadcast implementations, and hopefully, SPARK-889 will not be necessary any more.
Diffstat (limited to 'project/SparkBuild.scala')
0 files changed, 0 insertions, 0 deletions