diff options
author | Yin Huai <yhuai@databricks.com> | 2015-08-02 23:32:09 -0700 |
---|---|---|
committer | Josh Rosen <joshrosen@databricks.com> | 2015-08-02 23:32:09 -0700 |
commit | 687c8c37150f4c93f8e57d86bb56321a4891286b (patch) | |
tree | 5fc768cdf7b01dae261706c148c7fcd3cf622b9d /unsafe/src | |
parent | 4cdd8ecd66769316e8593da7790b84cd867968cd (diff) | |
download | spark-687c8c37150f4c93f8e57d86bb56321a4891286b.tar.gz spark-687c8c37150f4c93f8e57d86bb56321a4891286b.tar.bz2 spark-687c8c37150f4c93f8e57d86bb56321a4891286b.zip |
[SPARK-9372] [SQL] Filter nulls in join keys
This PR adds an optimization rule, `FilterNullsInJoinKey`, to add `Filter` before join operators to filter out rows having null values for join keys.
This optimization is guarded by a new SQL conf, `spark.sql.advancedOptimization`.
The code in this PR was authored by yhuai; I'm opening this PR to factor out this change from #7685, a larger pull request which contains two other optimizations.
Author: Yin Huai <yhuai@databricks.com>
Author: Josh Rosen <joshrosen@databricks.com>
Closes #7768 from JoshRosen/filter-nulls-in-join-key and squashes the following commits:
c02fc3f [Yin Huai] Address Josh's comments.
0a8e096 [Yin Huai] Update comments.
ea7d5a6 [Yin Huai] Make sure we do not keep adding filters.
be88760 [Yin Huai] Make it clear that FilterNullsInJoinKeySuite.scala is used to test FilterNullsInJoinKey.
8bb39ad [Yin Huai] Fix non-deterministic tests.
303236b [Josh Rosen] Revert changes that are unrelated to null join key filtering
40eeece [Josh Rosen] Merge remote-tracking branch 'origin/master' into filter-nulls-in-join-key
c57a954 [Yin Huai] Bug fix.
d3d2e64 [Yin Huai] First round of cleanup.
f9516b0 [Yin Huai] Style
c6667e7 [Yin Huai] Add PartitioningCollection.
e616d3b [Yin Huai] wip
7c2d2d8 [Yin Huai] Bug fix and refactoring.
69bb072 [Yin Huai] Introduce NullSafeHashPartitioning and NullUnsafePartitioning.
d5b84c3 [Yin Huai] Do not add unnessary filters.
2201129 [Yin Huai] Filter out rows that will not be joined in equal joins early.
Diffstat (limited to 'unsafe/src')
0 files changed, 0 insertions, 0 deletions