diff options
author | Sameer Agarwal <sameer@databricks.com> | 2016-03-07 12:04:59 -0800 |
---|---|---|
committer | Yin Huai <yhuai@databricks.com> | 2016-03-07 12:04:59 -0800 |
commit | ef77003178eb5cdcb4fe519fc540917656c5d577 (patch) | |
tree | e98a1feca6b9a8b80a767d938a4bf31e9c61d9af /docs/ml-ann.md | |
parent | 489641117651d11806d2773b7ded7c163d0260e5 (diff) | |
download | spark-ef77003178eb5cdcb4fe519fc540917656c5d577.tar.gz spark-ef77003178eb5cdcb4fe519fc540917656c5d577.tar.bz2 spark-ef77003178eb5cdcb4fe519fc540917656c5d577.zip |
[SPARK-13495][SQL] Add Null Filters in the query plan for Filters/Joins based on their data constraints
## What changes were proposed in this pull request?
This PR adds an optimizer rule to eliminate reading (unnecessary) NULL values if they are not required for correctness by inserting `isNotNull` filters is the query plan. These filters are currently inserted beneath existing `Filter` and `Join` operators and are inferred based on their data constraints.
Note: While this optimization is applicable to all types of join, it primarily benefits `Inner` and `LeftSemi` joins.
## How was this patch tested?
1. Added a new `NullFilteringSuite` that tests for `IsNotNull` filters in the query plan for joins and filters. Also, tests interaction with the `CombineFilters` optimizer rules.
2. Test generated ExpressionTrees via `OrcFilterSuite`
3. Test filter source pushdown logic via `SimpleTextHadoopFsRelationSuite`
cc yhuai nongli
Author: Sameer Agarwal <sameer@databricks.com>
Closes #11372 from sameeragarwal/gen-isnotnull.
Diffstat (limited to 'docs/ml-ann.md')
0 files changed, 0 insertions, 0 deletions