[SPARK-13495][SQL] Add Null Filters in the query plan for Filters/Joins based on their data constraints - spark

diff options

author	Sameer Agarwal <sameer@databricks.com>	2016-03-07 12:04:59 -0800
committer	Yin Huai <yhuai@databricks.com>	2016-03-07 12:04:59 -0800
commit	ef77003178eb5cdcb4fe519fc540917656c5d577 (patch)
tree	e98a1feca6b9a8b80a767d938a4bf31e9c61d9af /docs/ml-ann.md
parent	489641117651d11806d2773b7ded7c163d0260e5 (diff)
download	spark-ef77003178eb5cdcb4fe519fc540917656c5d577.tar.gz spark-ef77003178eb5cdcb4fe519fc540917656c5d577.tar.bz2 spark-ef77003178eb5cdcb4fe519fc540917656c5d577.zip

[SPARK-13495][SQL] Add Null Filters in the query plan for Filters/Joins based on their data constraints

## What changes were proposed in this pull request? This PR adds an optimizer rule to eliminate reading (unnecessary) NULL values if they are not required for correctness by inserting `isNotNull` filters is the query plan. These filters are currently inserted beneath existing `Filter` and `Join` operators and are inferred based on their data constraints. Note: While this optimization is applicable to all types of join, it primarily benefits `Inner` and `LeftSemi` joins. ## How was this patch tested? 1. Added a new `NullFilteringSuite` that tests for `IsNotNull` filters in the query plan for joins and filters. Also, tests interaction with the `CombineFilters` optimizer rules. 2. Test generated ExpressionTrees via `OrcFilterSuite` 3. Test filter source pushdown logic via `SimpleTextHadoopFsRelationSuite` cc yhuai nongli Author: Sameer Agarwal <sameer@databricks.com> Closes #11372 from sameeragarwal/gen-isnotnull.

Diffstat (limited to 'docs/ml-ann.md')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: