aboutsummaryrefslogtreecommitdiff
path: root/mllib
diff options
context:
space:
mode:
authorCheng Hao <hao.cheng@intel.com>2014-08-11 20:45:14 -0700
committerMichael Armbrust <michael@databricks.com>2014-08-11 20:45:14 -0700
commit5d54d71ddbac1fbb26925a8c9138bbb8c0e81db8 (patch)
tree01916ba2c12f4bd1a967adc46bb095eeac13d8a3 /mllib
parentbad21ed085a505559dccc06223b486170371ddd2 (diff)
downloadspark-5d54d71ddbac1fbb26925a8c9138bbb8c0e81db8.tar.gz
spark-5d54d71ddbac1fbb26925a8c9138bbb8c0e81db8.tar.bz2
spark-5d54d71ddbac1fbb26925a8c9138bbb8c0e81db8.zip
[SQL] [SPARK-2826] Reduce the memory copy while building the hashmap for HashOuterJoin
This is a follow up for #1147 , this PR will improve the performance about 10% - 15% in my local tests. ``` Before: LeftOuterJoin: took 16750 ms ([3000000] records) LeftOuterJoin: took 15179 ms ([3000000] records) RightOuterJoin: took 15515 ms ([3000000] records) RightOuterJoin: took 15276 ms ([3000000] records) FullOuterJoin: took 19150 ms ([6000000] records) FullOuterJoin: took 18935 ms ([6000000] records) After: LeftOuterJoin: took 15218 ms ([3000000] records) LeftOuterJoin: took 13503 ms ([3000000] records) RightOuterJoin: took 13663 ms ([3000000] records) RightOuterJoin: took 14025 ms ([3000000] records) FullOuterJoin: took 16624 ms ([6000000] records) FullOuterJoin: took 16578 ms ([6000000] records) ``` Besides the performance improvement, I also do some clean up as suggested in #1147 Author: Cheng Hao <hao.cheng@intel.com> Closes #1765 from chenghao-intel/hash_outer_join_fixing and squashes the following commits: ab1f9e0 [Cheng Hao] Reduce the memory copy while building the hashmap
Diffstat (limited to 'mllib')
0 files changed, 0 insertions, 0 deletions