[SPARK-13095] [SQL] improve performance for broadcast join with dimension table - spark

diff options

author	Davies Liu <davies@databricks.com>	2016-02-08 14:09:14 -0800
committer	Davies Liu <davies.liu@gmail.com>	2016-02-08 14:09:14 -0800
commit	ff0af0ddfa4d198b203c3a39f8532cfbd4f4e027 (patch)
tree	bed882aeeb85eeb67562b1d2c58390d257896bca /project/MimaExcludes.scala
parent	37bc203c8dd5022cb11d53b697c28a737ee85bcc (diff)
download	spark-ff0af0ddfa4d198b203c3a39f8532cfbd4f4e027.tar.gz spark-ff0af0ddfa4d198b203c3a39f8532cfbd4f4e027.tar.bz2 spark-ff0af0ddfa4d198b203c3a39f8532cfbd4f4e027.zip

[SPARK-13095] [SQL] improve performance for broadcast join with dimension table

This PR improve the performance for Broadcast join with dimension tables, which is common in data warehouse. If the join key can fit in a long, we will use a special api `get(Long)` to get the rows from HashedRelation. If the HashedRelation only have unique keys, we will use a special api `getValue(Long)` or `getValue(InternalRow)`. If the keys can fit within a long, also the keys are dense, we will use a array of UnsafeRow, instead a hash map. TODO: will do cleanup Author: Davies Liu <davies@databricks.com> Closes #11065 from davies/gen_dim.

Diffstat (limited to 'project/MimaExcludes.scala')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: