diff options
author | gatorsmile <gatorsmile@gmail.com> | 2016-03-15 00:30:14 -0700 |
---|---|---|
committer | Reynold Xin <rxin@databricks.com> | 2016-03-15 00:30:14 -0700 |
commit | 99bd2f0e94657687834c5c59c4270c1484c9f595 (patch) | |
tree | 524dd1a4635dba48b6e83e1cd3ef324e790b05a9 /sql/core/src | |
parent | 276c2d51a3bbe2531763a11580adfec7e39fdd58 (diff) | |
download | spark-99bd2f0e94657687834c5c59c4270c1484c9f595.tar.gz spark-99bd2f0e94657687834c5c59c4270c1484c9f595.tar.bz2 spark-99bd2f0e94657687834c5c59c4270c1484c9f595.zip |
[SPARK-13840][SQL] Split Optimizer Rule ColumnPruning to ColumnPruning and EliminateOperator
#### What changes were proposed in this pull request?
Before this PR, two Optimizer rules `ColumnPruning` and `PushPredicateThroughProject` reverse each other's effects. Optimizer always reaches the max iteration when optimizing some queries. Extra `Project` are found in the plan. For example, below is the optimized plan after reaching 100 iterations:
```
Join Inner, Some((cast(id1#16 as bigint) = id1#18L))
:- Project [id1#16]
: +- Filter isnotnull(cast(id1#16 as bigint))
: +- Project [id1#16]
: +- Relation[id1#16,newCol#17] JSON part: struct<>, data: struct<id1:int,newCol:int>
+- Filter isnotnull(id1#18L)
+- Relation[id1#18L] JSON part: struct<>, data: struct<id1:bigint>
```
This PR splits the optimizer rule `ColumnPruning` to `ColumnPruning` and `EliminateOperators`
The issue becomes worse when having another rule `NullFiltering`, which could add extra Filters for `IsNotNull`. We have to be careful when introducing extra `Filter` if the benefit is not large enough. Another PR will be submitted by sameeragarwal to handle this issue.
cc sameeragarwal marmbrus
In addition, `ColumnPruning` should not push `Project` through non-deterministic `Filter`. This could cause wrong results. This will be put in a separate PR.
cc davies cloud-fan yhuai
#### How was this patch tested?
Modified the existing test cases.
Author: gatorsmile <gatorsmile@gmail.com>
Closes #11682 from gatorsmile/viewDuplicateNames.
Diffstat (limited to 'sql/core/src')
0 files changed, 0 insertions, 0 deletions