[SPARK-7251] Perform sequential scan when iterating over BytesToBytesMap - spark

diff options

author	Josh Rosen <joshrosen@databricks.com>	2015-05-20 16:42:49 -0700
committer	Josh Rosen <joshrosen@databricks.com>	2015-05-20 16:43:09 -0700
commit	82bc518cf890b19abdf0d93b5c4429862b8e4441 (patch)
tree	f4d0f5aa9d5c0f3792867804f9f95d44d2502d20 /repl
parent	7cea552e1edf1d4e0143349692ff751ca493a912 (diff)
download	spark-82bc518cf890b19abdf0d93b5c4429862b8e4441.tar.gz spark-82bc518cf890b19abdf0d93b5c4429862b8e4441.tar.bz2 spark-82bc518cf890b19abdf0d93b5c4429862b8e4441.zip

[SPARK-7251] Perform sequential scan when iterating over BytesToBytesMap

This patch modifies `BytesToBytesMap.iterator()` to iterate through records in the order that they appear in the data pages rather than iterating through the hashtable pointer arrays. This results in fewer random memory accesses, significantly improving performance for scan-and-copy operations. This is possible because our data pages are laid out as sequences of `[keyLength][data][valueLength][data]` entries. In order to mark the end of a partially-filled data page, we write `-1` as a special end-of-page length (BytesToByesMap supports empty/zero-length keys and values, which is why we had to use a negative length). This patch incorporates / closes #5836. Author: Josh Rosen <joshrosen@databricks.com> Closes #6159 from JoshRosen/SPARK-7251 and squashes the following commits: 05bd90a [Josh Rosen] Compare capacity, not size, to MAX_CAPACITY 2a20d71 [Josh Rosen] Fix maximum BytesToBytesMap capacity bc4854b [Josh Rosen] Guard against overflow when growing BytesToBytesMap f5feadf [Josh Rosen] Add test for iterating over an empty map 273b842 [Josh Rosen] [SPARK-7251] Perform sequential scan when iterating over entries in BytesToBytesMap (cherry picked from commit f2faa7af30662e3bdf15780f8719c71108f8e30b) Signed-off-by: Josh Rosen <joshrosen@databricks.com>

Diffstat (limited to 'repl')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: