diff options
author | jiangxingbo <jiangxb1987@gmail.com> | 2016-09-13 17:04:51 +0200 |
---|---|---|
committer | Herman van Hovell <hvanhovell@databricks.com> | 2016-09-13 17:04:51 +0200 |
commit | 4ba63b193c1ac292493e06343d9d618c12c5ef3f (patch) | |
tree | a2de62841e287d2e52681bba9421468a337ca739 /.gitignore | |
parent | 3f6a2bb3f7beac4ce928eb660ee36258b5b9e8c8 (diff) | |
download | spark-4ba63b193c1ac292493e06343d9d618c12c5ef3f.tar.gz spark-4ba63b193c1ac292493e06343d9d618c12c5ef3f.tar.bz2 spark-4ba63b193c1ac292493e06343d9d618c12c5ef3f.zip |
[SPARK-17142][SQL] Complex query triggers binding error in HashAggregateExec
## What changes were proposed in this pull request?
In `ReorderAssociativeOperator` rule, we extract foldable expressions with Add/Multiply arithmetics, and replace with eval literal. For example, `(a + 1) + (b + 2)` is optimized to `(a + b + 3)` by this rule.
For aggregate operator, output expressions should be derived from groupingExpressions, current implemenation of `ReorderAssociativeOperator` rule may break this promise. A instance could be:
```
SELECT
((t1.a + 1) + (t2.a + 2)) AS out_col
FROM
testdata2 AS t1
INNER JOIN
testdata2 AS t2
ON
(t1.a = t2.a)
GROUP BY (t1.a + 1), (t2.a + 2)
```
`((t1.a + 1) + (t2.a + 2))` is optimized to `(t1.a + t2.a + 3)`, which could not be derived from `ExpressionSet((t1.a +1), (t2.a + 2))`.
Maybe we should improve the rule of `ReorderAssociativeOperator` by adding a GroupingExpressionSet to keep Aggregate.groupingExpressions, and respect these expressions during the optimize stage.
## How was this patch tested?
Add new test case in `ReorderAssociativeOperatorSuite`.
Author: jiangxingbo <jiangxb1987@gmail.com>
Closes #14917 from jiangxb1987/rao.
Diffstat (limited to '.gitignore')
0 files changed, 0 insertions, 0 deletions