diff options
author | Sital Kedia <skedia@fb.com> | 2016-05-27 11:22:39 -0700 |
---|---|---|
committer | Andrew Or <andrew@databricks.com> | 2016-05-27 11:22:39 -0700 |
commit | ce756daa4f012ebdc5a41bf5a89ff11b6dfdab8c (patch) | |
tree | 964120ed4895c8eaf70b73692664d8fdefc7542a /core/src/test | |
parent | 5bdbedf2201efa6c34392aa9eff709761f027e1d (diff) | |
download | spark-ce756daa4f012ebdc5a41bf5a89ff11b6dfdab8c.tar.gz spark-ce756daa4f012ebdc5a41bf5a89ff11b6dfdab8c.tar.bz2 spark-ce756daa4f012ebdc5a41bf5a89ff11b6dfdab8c.zip |
[SPARK-15569] Reduce frequency of updateBytesWritten function in Disk…
## What changes were proposed in this pull request?
Profiling a Spark job spilling large amount of intermediate data we found that significant portion of time is being spent in DiskObjectWriter.updateBytesWritten function. Looking at the code, we see that the function is being called too frequently to update the number of bytes written to disk. We should reduce the frequency to avoid this.
## How was this patch tested?
Tested by running the job on cluster and saw 20% CPU gain by this change.
Author: Sital Kedia <skedia@fb.com>
Closes #13332 from sitalkedia/DiskObjectWriter.
Diffstat (limited to 'core/src/test')
-rw-r--r-- | core/src/test/scala/org/apache/spark/storage/DiskBlockObjectWriterSuite.scala | 12 |
1 files changed, 6 insertions, 6 deletions
diff --git a/core/src/test/scala/org/apache/spark/storage/DiskBlockObjectWriterSuite.scala b/core/src/test/scala/org/apache/spark/storage/DiskBlockObjectWriterSuite.scala index 8eff3c2970..ec4ef4b2fc 100644 --- a/core/src/test/scala/org/apache/spark/storage/DiskBlockObjectWriterSuite.scala +++ b/core/src/test/scala/org/apache/spark/storage/DiskBlockObjectWriterSuite.scala @@ -53,13 +53,13 @@ class DiskBlockObjectWriterSuite extends SparkFunSuite with BeforeAndAfterEach { assert(writeMetrics.recordsWritten === 1) // Metrics don't update on every write assert(writeMetrics.bytesWritten == 0) - // After 32 writes, metrics should update - for (i <- 0 until 32) { + // After 16384 writes, metrics should update + for (i <- 0 until 16384) { writer.flush() writer.write(Long.box(i), Long.box(i)) } assert(writeMetrics.bytesWritten > 0) - assert(writeMetrics.recordsWritten === 33) + assert(writeMetrics.recordsWritten === 16385) writer.commitAndClose() assert(file.length() == writeMetrics.bytesWritten) } @@ -75,13 +75,13 @@ class DiskBlockObjectWriterSuite extends SparkFunSuite with BeforeAndAfterEach { assert(writeMetrics.recordsWritten === 1) // Metrics don't update on every write assert(writeMetrics.bytesWritten == 0) - // After 32 writes, metrics should update - for (i <- 0 until 32) { + // After 16384 writes, metrics should update + for (i <- 0 until 16384) { writer.flush() writer.write(Long.box(i), Long.box(i)) } assert(writeMetrics.bytesWritten > 0) - assert(writeMetrics.recordsWritten === 33) + assert(writeMetrics.recordsWritten === 16385) writer.revertPartialWritesAndClose() assert(writeMetrics.bytesWritten == 0) assert(writeMetrics.recordsWritten == 0) |