[SPARK-14137] [SQL] Cleanup hash join - spark

diff options

author	Davies Liu <davies@databricks.com>	2016-04-04 10:01:24 -0700
committer	Davies Liu <davies.liu@gmail.com>	2016-04-04 10:01:24 -0700
commit	745425332f41e2ae94649f9d1ad675243f36f743 (patch)
tree	78f29665e7d8dc7bb8cb9c7cfb4ec9ef5cce15c3 /mllib/src/test
parent	0340b3d279de6be4903673bbf3e6a1a2653de6c0 (diff)
download	spark-745425332f41e2ae94649f9d1ad675243f36f743.tar.gz spark-745425332f41e2ae94649f9d1ad675243f36f743.tar.bz2 spark-745425332f41e2ae94649f9d1ad675243f36f743.zip

[SPARK-14137] [SQL] Cleanup hash join

## What changes were proposed in this pull request? This PR did a few cleanup on HashedRelation and HashJoin: 1) Merge HashedRelation and UniqueHashedRelation together 2) Return an iterator from HashedRelation, so we donot need a create many UnsafeRow objects. 3) Return a copy of HashedRelation for thread-safety in BroadcastJoin, so we can re-use the UnafeRow objects. 4) Cleanup HashJoin, share most of the code between BroadcastHashJoin and ShuffleHashJoin 5) Removed UniqueLongHashedRelation, which will be replaced by LongUnsafeMap (another PR). 6) Update benchmark, before this patch, the selectivity of joins are too high. ## How was this patch tested? Existing tests. Author: Davies Liu <davies@databricks.com> Closes #12102 from davies/cleanup_hash.

Diffstat (limited to 'mllib/src/test')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: