diff options
author | Sameer Agarwal <sameer@databricks.com> | 2016-04-15 15:55:31 -0700 |
---|---|---|
committer | Yin Huai <yhuai@databricks.com> | 2016-04-15 15:55:31 -0700 |
commit | 4df65184b6b865a26e4d5c99bbfd3c24ab7179dc (patch) | |
tree | 16611a5f2cfa0be0006bc72f49b682b025f78ba1 /docs/index.md | |
parent | 8028a28885dbd90f20e38922240618fc310a0a65 (diff) | |
download | spark-4df65184b6b865a26e4d5c99bbfd3c24ab7179dc.tar.gz spark-4df65184b6b865a26e4d5c99bbfd3c24ab7179dc.tar.bz2 spark-4df65184b6b865a26e4d5c99bbfd3c24ab7179dc.zip |
[SPARK-14620][SQL] Use/benchmark a better hash in VectorizedHashMap
## What changes were proposed in this pull request?
This PR uses a better hashing algorithm while probing the AggregateHashMap:
```java
long h = 0
h = (h ^ (0x9e3779b9)) + key_1 + (h << 6) + (h >>> 2);
h = (h ^ (0x9e3779b9)) + key_2 + (h << 6) + (h >>> 2);
h = (h ^ (0x9e3779b9)) + key_3 + (h << 6) + (h >>> 2);
...
h = (h ^ (0x9e3779b9)) + key_n + (h << 6) + (h >>> 2);
return h
```
Depends on: https://github.com/apache/spark/pull/12345
## How was this patch tested?
Java HotSpot(TM) 64-Bit Server VM 1.8.0_73-b02 on Mac OS X 10.11.4
Intel(R) Core(TM) i7-4960HQ CPU 2.60GHz
Aggregate w keys: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
-------------------------------------------------------------------------------------------
codegen = F 2417 / 2457 8.7 115.2 1.0X
codegen = T hashmap = F 1554 / 1581 13.5 74.1 1.6X
codegen = T hashmap = T 877 / 929 23.9 41.8 2.8X
Author: Sameer Agarwal <sameer@databricks.com>
Closes #12379 from sameeragarwal/hash.
Diffstat (limited to 'docs/index.md')
0 files changed, 0 insertions, 0 deletions