diff options
author | Patrick Wendell <pwendell@gmail.com> | 2013-10-20 22:20:32 -0700 |
---|---|---|
committer | Patrick Wendell <pwendell@gmail.com> | 2013-10-20 22:20:32 -0700 |
commit | 35886f347466b25625d5391c97c2deb8293ebc66 (patch) | |
tree | 2a77302e3c1caa6615089507278c6e10eaeaf5b1 /yarn | |
parent | 5b9380e0173b3d3d13235ae912e9ccc2a974b98b (diff) | |
parent | 9e9e9e1b42df26244d29b8920a41177e296a85c4 (diff) | |
download | spark-35886f347466b25625d5391c97c2deb8293ebc66.tar.gz spark-35886f347466b25625d5391c97c2deb8293ebc66.tar.bz2 spark-35886f347466b25625d5391c97c2deb8293ebc66.zip |
Merge pull request #41 from pwendell/shuffle-benchmark
Provide Instrumentation for Shuffle Write Performance
Shuffle write performance can have a major impact on the performance of jobs. This patch adds a few pieces of instrumentation related to shuffle writes. They are:
1. A listing of the time spent performing blocking writes for each task. This is implemented by keeping track of the aggregate delay seen by many individual writes.
2. An undocumented option `spark.shuffle.sync` which forces shuffle data to sync to disk. This is necessary for measuring shuffle performance in the absence of the OS buffer cache.
3. An internal utility which micro-benchmarks write throughput for simulated shuffle outputs.
I'm going to do some performance testing on this to see whether these small timing calls add overhead. From a feature perspective, however, I consider this complete. Any feedback is appreciated.
Diffstat (limited to 'yarn')
0 files changed, 0 insertions, 0 deletions