aboutsummaryrefslogtreecommitdiff
path: root/sql/core/src
diff options
context:
space:
mode:
authorgatorsmile <gatorsmile@gmail.com>2016-03-15 00:30:14 -0700
committerReynold Xin <rxin@databricks.com>2016-03-15 00:30:14 -0700
commit99bd2f0e94657687834c5c59c4270c1484c9f595 (patch)
tree524dd1a4635dba48b6e83e1cd3ef324e790b05a9 /sql/core/src
parent276c2d51a3bbe2531763a11580adfec7e39fdd58 (diff)
downloadspark-99bd2f0e94657687834c5c59c4270c1484c9f595.tar.gz
spark-99bd2f0e94657687834c5c59c4270c1484c9f595.tar.bz2
spark-99bd2f0e94657687834c5c59c4270c1484c9f595.zip
[SPARK-13840][SQL] Split Optimizer Rule ColumnPruning to ColumnPruning and EliminateOperator
#### What changes were proposed in this pull request? Before this PR, two Optimizer rules `ColumnPruning` and `PushPredicateThroughProject` reverse each other's effects. Optimizer always reaches the max iteration when optimizing some queries. Extra `Project` are found in the plan. For example, below is the optimized plan after reaching 100 iterations: ``` Join Inner, Some((cast(id1#16 as bigint) = id1#18L)) :- Project [id1#16] : +- Filter isnotnull(cast(id1#16 as bigint)) : +- Project [id1#16] : +- Relation[id1#16,newCol#17] JSON part: struct<>, data: struct<id1:int,newCol:int> +- Filter isnotnull(id1#18L) +- Relation[id1#18L] JSON part: struct<>, data: struct<id1:bigint> ``` This PR splits the optimizer rule `ColumnPruning` to `ColumnPruning` and `EliminateOperators` The issue becomes worse when having another rule `NullFiltering`, which could add extra Filters for `IsNotNull`. We have to be careful when introducing extra `Filter` if the benefit is not large enough. Another PR will be submitted by sameeragarwal to handle this issue. cc sameeragarwal marmbrus In addition, `ColumnPruning` should not push `Project` through non-deterministic `Filter`. This could cause wrong results. This will be put in a separate PR. cc davies cloud-fan yhuai #### How was this patch tested? Modified the existing test cases. Author: gatorsmile <gatorsmile@gmail.com> Closes #11682 from gatorsmile/viewDuplicateNames.
Diffstat (limited to 'sql/core/src')
0 files changed, 0 insertions, 0 deletions