Merge pull request #41 from pwendell/shuffle-benchmark - spark

diff options

author	Patrick Wendell <pwendell@gmail.com>	2013-10-20 22:20:32 -0700
committer	Patrick Wendell <pwendell@gmail.com>	2013-10-20 22:20:32 -0700
commit	35886f347466b25625d5391c97c2deb8293ebc66 (patch)
tree	2a77302e3c1caa6615089507278c6e10eaeaf5b1 /yarn
parent	5b9380e0173b3d3d13235ae912e9ccc2a974b98b (diff)
parent	9e9e9e1b42df26244d29b8920a41177e296a85c4 (diff)
download	spark-35886f347466b25625d5391c97c2deb8293ebc66.tar.gz spark-35886f347466b25625d5391c97c2deb8293ebc66.tar.bz2 spark-35886f347466b25625d5391c97c2deb8293ebc66.zip

Merge pull request #41 from pwendell/shuffle-benchmark

Provide Instrumentation for Shuffle Write Performance Shuffle write performance can have a major impact on the performance of jobs. This patch adds a few pieces of instrumentation related to shuffle writes. They are: 1. A listing of the time spent performing blocking writes for each task. This is implemented by keeping track of the aggregate delay seen by many individual writes. 2. An undocumented option `spark.shuffle.sync` which forces shuffle data to sync to disk. This is necessary for measuring shuffle performance in the absence of the OS buffer cache. 3. An internal utility which micro-benchmarks write throughput for simulated shuffle outputs. I'm going to do some performance testing on this to see whether these small timing calls add overhead. From a feature perspective, however, I consider this complete. Any feedback is appreciated.

Diffstat (limited to 'yarn')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: