diff options
author | Andrew Or <andrewor14@gmail.com> | 2014-08-28 17:05:21 -0700 |
---|---|---|
committer | Patrick Wendell <pwendell@gmail.com> | 2014-08-28 17:05:21 -0700 |
commit | a46b8f2d710d82ba3a212cac64b610a67b8798f9 (patch) | |
tree | 50cfb2e6c5c6a55948c6cb95425e7a3624ffb60c /dev/run-tests-jenkins | |
parent | 92af2314f27e80227174499f2fca505bd551cda7 (diff) | |
download | spark-a46b8f2d710d82ba3a212cac64b610a67b8798f9.tar.gz spark-a46b8f2d710d82ba3a212cac64b610a67b8798f9.tar.bz2 spark-a46b8f2d710d82ba3a212cac64b610a67b8798f9.zip |
[SPARK-3277] Fix external spilling with LZ4 assertion error
**Summary of the changes**
The bulk of this PR is comprised of tests and documentation; the actual fix is really just adding 1 line of code (see `BlockObjectWriter.scala`). We currently do not run the `External*` test suites with different compression codecs, and this would have caught the bug reported in [SPARK-3277](https://issues.apache.org/jira/browse/SPARK-3277). This PR extends the existing code to test spilling using all compression codecs known to Spark, including `LZ4`.
**The bug itself**
In `DiskBlockObjectWriter`, we only report the shuffle bytes written before we close the streams. With `LZ4`, all the bytes written reported by our metrics were 0 because `flush()` was not taking effect for some reason. In general, compression codecs may write additional bytes to the file after we call `close()`, and so we must also capture those bytes in our shuffle write metrics.
Thanks mridulm and pwendell for help with debugging.
Author: Andrew Or <andrewor14@gmail.com>
Author: Patrick Wendell <pwendell@gmail.com>
Closes #2187 from andrewor14/fix-lz4-spilling and squashes the following commits:
1b54bdc [Andrew Or] Speed up tests by not compressing everything
1c4624e [Andrew Or] Merge branch 'master' of github.com:apache/spark into fix-lz4-spilling
6b2e7d1 [Andrew Or] Fix compilation error
92e251b [Patrick Wendell] Better documentation for BlockObjectWriter.
a1ad536 [Andrew Or] Fix tests
089593f [Andrew Or] Actually fix SPARK-3277 (tests still fail)
4bbcf68 [Andrew Or] Update tests to actually test all compression codecs
b264a84 [Andrew Or] ExternalAppendOnlyMapSuite code style fixes (minor)
1bfa743 [Andrew Or] Add more information to assert for better debugging
Diffstat (limited to 'dev/run-tests-jenkins')
0 files changed, 0 insertions, 0 deletions