aboutsummaryrefslogtreecommitdiff
path: root/sql/hive/src/main
diff options
context:
space:
mode:
authorSameer Agarwal <sameer@databricks.com>2016-04-15 15:55:31 -0700
committerYin Huai <yhuai@databricks.com>2016-04-15 15:55:31 -0700
commit4df65184b6b865a26e4d5c99bbfd3c24ab7179dc (patch)
tree16611a5f2cfa0be0006bc72f49b682b025f78ba1 /sql/hive/src/main
parent8028a28885dbd90f20e38922240618fc310a0a65 (diff)
downloadspark-4df65184b6b865a26e4d5c99bbfd3c24ab7179dc.tar.gz
spark-4df65184b6b865a26e4d5c99bbfd3c24ab7179dc.tar.bz2
spark-4df65184b6b865a26e4d5c99bbfd3c24ab7179dc.zip
[SPARK-14620][SQL] Use/benchmark a better hash in VectorizedHashMap
## What changes were proposed in this pull request? This PR uses a better hashing algorithm while probing the AggregateHashMap: ```java long h = 0 h = (h ^ (0x9e3779b9)) + key_1 + (h << 6) + (h >>> 2); h = (h ^ (0x9e3779b9)) + key_2 + (h << 6) + (h >>> 2); h = (h ^ (0x9e3779b9)) + key_3 + (h << 6) + (h >>> 2); ... h = (h ^ (0x9e3779b9)) + key_n + (h << 6) + (h >>> 2); return h ``` Depends on: https://github.com/apache/spark/pull/12345 ## How was this patch tested? Java HotSpot(TM) 64-Bit Server VM 1.8.0_73-b02 on Mac OS X 10.11.4 Intel(R) Core(TM) i7-4960HQ CPU 2.60GHz Aggregate w keys: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------- codegen = F 2417 / 2457 8.7 115.2 1.0X codegen = T hashmap = F 1554 / 1581 13.5 74.1 1.6X codegen = T hashmap = T 877 / 929 23.9 41.8 2.8X Author: Sameer Agarwal <sameer@databricks.com> Closes #12379 from sameeragarwal/hash.
Diffstat (limited to 'sql/hive/src/main')
0 files changed, 0 insertions, 0 deletions