aboutsummaryrefslogtreecommitdiff
path: root/docs/scala-programming-guide.md
diff options
context:
space:
mode:
authorAndrew Or <andrewor14@gmail.com>2014-02-21 20:05:39 -0800
committerPatrick Wendell <pwendell@gmail.com>2014-02-21 20:05:39 -0800
commitfefd22f4c3e95d904cb6f4f3fd88b89050907ae9 (patch)
tree3097922b6f43a806ba9df29be190981323fccaaa /docs/scala-programming-guide.md
parentc8a4c9b1f6005815f5a4a331970624d1706b6b13 (diff)
downloadspark-fefd22f4c3e95d904cb6f4f3fd88b89050907ae9.tar.gz
spark-fefd22f4c3e95d904cb6f4f3fd88b89050907ae9.tar.bz2
spark-fefd22f4c3e95d904cb6f4f3fd88b89050907ae9.zip
[SPARK-1113] External spilling - fix Int.MaxValue hash code collision bug
The original poster of this bug is @guojc, who opened a PR that preceded this one at https://github.com/apache/incubator-spark/pull/612. ExternalAppendOnlyMap uses key hash code to order the buffer streams from which spilled files are read back into memory. When a buffer stream is empty, the default hash code for that stream is equal to Int.MaxValue. This is, however, a perfectly legitimate candidate for a key hash code. When reading from a spilled map containing such a key, a hash collision may occur, in which case we attempt to read from an empty stream and throw NoSuchElementException. The fix is to maintain the invariant that empty buffer streams are never added back to the merge queue to be considered. This guarantees that we never read from an empty buffer stream, ever again. This PR also includes two new tests for hash collisions. Author: Andrew Or <andrewor14@gmail.com> Closes #624 from andrewor14/spilling-bug and squashes the following commits: 9e7263d [Andrew Or] Slightly optimize next() 2037ae2 [Andrew Or] Move a few comments around... cf95942 [Andrew Or] Remove default value of Int.MaxValue for minKeyHash c11f03b [Andrew Or] Fix Int.MaxValue hash collision bug in ExternalAppendOnlyMap 21c1a39 [Andrew Or] Add hash collision tests to ExternalAppendOnlyMapSuite
Diffstat (limited to 'docs/scala-programming-guide.md')
0 files changed, 0 insertions, 0 deletions