[SPARK-14052] [SQL] build a BytesToBytesMap directly in HashedRelation - spark

diff options

author	Davies Liu <davies@databricks.com>	2016-03-28 13:07:32 -0700
committer	Davies Liu <davies.liu@gmail.com>	2016-03-28 13:07:32 -0700
commit	d7b58f1461f71ee3c028360eef0ffedd17d6a076 (patch)
tree	58ddca8bb29534ecb77446e6706f33d885e01bd4 /sql/catalyst/pom.xml
parent	600c0b69cab4767e8e5a6f4284777d8b9d4bd40e (diff)
download	spark-d7b58f1461f71ee3c028360eef0ffedd17d6a076.tar.gz spark-d7b58f1461f71ee3c028360eef0ffedd17d6a076.tar.bz2 spark-d7b58f1461f71ee3c028360eef0ffedd17d6a076.zip

[SPARK-14052] [SQL] build a BytesToBytesMap directly in HashedRelation

## What changes were proposed in this pull request? Currently, for the key that can not fit within a long, we build a hash map for UnsafeHashedRelation, it's converted to BytesToBytesMap after serialization and deserialization. We should build a BytesToBytesMap directly to have better memory efficiency. In order to do that, BytesToBytesMap should support multiple (K,V) pair with the same K, Location.putNewKey() is renamed to Location.append(), which could append multiple values for the same key (same Location). `Location.newValue()` is added to find the next value for the same key. ## How was this patch tested? Existing tests. Added benchmark for broadcast hash join with duplicated keys. Author: Davies Liu <davies@databricks.com> Closes #11870 from davies/map2.

Diffstat (limited to 'sql/catalyst/pom.xml')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: