diff options
author | Cheng Hao <hao.cheng@intel.com> | 2014-08-11 20:45:14 -0700 |
---|---|---|
committer | Michael Armbrust <michael@databricks.com> | 2014-08-11 20:45:14 -0700 |
commit | 5d54d71ddbac1fbb26925a8c9138bbb8c0e81db8 (patch) | |
tree | 01916ba2c12f4bd1a967adc46bb095eeac13d8a3 /mllib | |
parent | bad21ed085a505559dccc06223b486170371ddd2 (diff) | |
download | spark-5d54d71ddbac1fbb26925a8c9138bbb8c0e81db8.tar.gz spark-5d54d71ddbac1fbb26925a8c9138bbb8c0e81db8.tar.bz2 spark-5d54d71ddbac1fbb26925a8c9138bbb8c0e81db8.zip |
[SQL] [SPARK-2826] Reduce the memory copy while building the hashmap for HashOuterJoin
This is a follow up for #1147 , this PR will improve the performance about 10% - 15% in my local tests.
```
Before:
LeftOuterJoin: took 16750 ms ([3000000] records)
LeftOuterJoin: took 15179 ms ([3000000] records)
RightOuterJoin: took 15515 ms ([3000000] records)
RightOuterJoin: took 15276 ms ([3000000] records)
FullOuterJoin: took 19150 ms ([6000000] records)
FullOuterJoin: took 18935 ms ([6000000] records)
After:
LeftOuterJoin: took 15218 ms ([3000000] records)
LeftOuterJoin: took 13503 ms ([3000000] records)
RightOuterJoin: took 13663 ms ([3000000] records)
RightOuterJoin: took 14025 ms ([3000000] records)
FullOuterJoin: took 16624 ms ([6000000] records)
FullOuterJoin: took 16578 ms ([6000000] records)
```
Besides the performance improvement, I also do some clean up as suggested in #1147
Author: Cheng Hao <hao.cheng@intel.com>
Closes #1765 from chenghao-intel/hash_outer_join_fixing and squashes the following commits:
ab1f9e0 [Cheng Hao] Reduce the memory copy while building the hashmap
Diffstat (limited to 'mllib')
0 files changed, 0 insertions, 0 deletions