aboutsummaryrefslogtreecommitdiff
path: root/yarn
diff options
context:
space:
mode:
authorPatrick Wendell <pwendell@gmail.com>2013-10-20 22:20:32 -0700
committerPatrick Wendell <pwendell@gmail.com>2013-10-20 22:20:32 -0700
commit35886f347466b25625d5391c97c2deb8293ebc66 (patch)
tree2a77302e3c1caa6615089507278c6e10eaeaf5b1 /yarn
parent5b9380e0173b3d3d13235ae912e9ccc2a974b98b (diff)
parent9e9e9e1b42df26244d29b8920a41177e296a85c4 (diff)
downloadspark-35886f347466b25625d5391c97c2deb8293ebc66.tar.gz
spark-35886f347466b25625d5391c97c2deb8293ebc66.tar.bz2
spark-35886f347466b25625d5391c97c2deb8293ebc66.zip
Merge pull request #41 from pwendell/shuffle-benchmark
Provide Instrumentation for Shuffle Write Performance Shuffle write performance can have a major impact on the performance of jobs. This patch adds a few pieces of instrumentation related to shuffle writes. They are: 1. A listing of the time spent performing blocking writes for each task. This is implemented by keeping track of the aggregate delay seen by many individual writes. 2. An undocumented option `spark.shuffle.sync` which forces shuffle data to sync to disk. This is necessary for measuring shuffle performance in the absence of the OS buffer cache. 3. An internal utility which micro-benchmarks write throughput for simulated shuffle outputs. I'm going to do some performance testing on this to see whether these small timing calls add overhead. From a feature perspective, however, I consider this complete. Any feedback is appreciated.
Diffstat (limited to 'yarn')
0 files changed, 0 insertions, 0 deletions