diff options
author | Sean Zhong <seanzhong@databricks.com> | 2016-08-16 15:51:30 +0800 |
---|---|---|
committer | Wenchen Fan <wenchen@databricks.com> | 2016-08-16 15:51:30 +0800 |
commit | 7b65030e7a0af3a0bd09370fb069d659b36ff7f0 (patch) | |
tree | c820a00facee5871059e8412c23125823944b838 /sql/core/src | |
parent | 7de30d6e9e5d3020d2ba8c2ce08893d9cd822b56 (diff) | |
download | spark-7b65030e7a0af3a0bd09370fb069d659b36ff7f0.tar.gz spark-7b65030e7a0af3a0bd09370fb069d659b36ff7f0.tar.bz2 spark-7b65030e7a0af3a0bd09370fb069d659b36ff7f0.zip |
[SPARK-17034][SQL] adds expression UnresolvedOrdinal to represent the ordinals in GROUP BY or ORDER BY
## What changes were proposed in this pull request?
This PR adds expression `UnresolvedOrdinal` to represent the ordinal in GROUP BY or ORDER BY, and fixes the rules when resolving ordinals.
Ordinals in GROUP BY or ORDER BY like `1` in `order by 1` or `group by 1` should be considered as unresolved before analysis. But in current code, it uses `Literal` expression to store the ordinal. This is inappropriate as `Literal` itself is a resolved expression, it gives the user a wrong message that the ordinals has already been resolved.
### Before this change
Ordinal is stored as `Literal` expression
```
scala> sc.setLogLevel("TRACE")
scala> sql("select a from t group by 1 order by 1")
...
'Sort [1 ASC], true
+- 'Aggregate [1], ['a]
+- 'UnresolvedRelation `t
```
For query:
```
scala> Seq(1).toDF("a").createOrReplaceTempView("t")
scala> sql("select count(a), a from t group by 2 having a > 0").show
```
During analysis, the intermediate plan before applying rule `ResolveAggregateFunctions` is:
```
'Filter ('a > 0)
+- Aggregate [2], [count(1) AS count(1)#83L, a#81]
+- LocalRelation [value#7 AS a#9]
```
Before this PR, rule `ResolveAggregateFunctions` believes all expressions of `Aggregate` have already been resolved, and tries to resolve the expressions in `Filter` directly. But this is wrong, as ordinal `2` in Aggregate is not really resolved!
### After this change
Ordinals are stored as `UnresolvedOrdinal`.
```
scala> sc.setLogLevel("TRACE")
scala> sql("select a from t group by 1 order by 1")
...
'Sort [unresolvedordinal(1) ASC], true
+- 'Aggregate [unresolvedordinal(1)], ['a]
+- 'UnresolvedRelation `t`
```
## How was this patch tested?
Unit tests.
Author: Sean Zhong <seanzhong@databricks.com>
Closes #14616 from clockfly/spark-16955.
Diffstat (limited to 'sql/core/src')
-rw-r--r-- | sql/core/src/test/resources/sql-tests/inputs/group-by-ordinal.sql | 6 | ||||
-rw-r--r-- | sql/core/src/test/resources/sql-tests/results/group-by-ordinal.sql.out | 28 |
2 files changed, 28 insertions, 6 deletions
diff --git a/sql/core/src/test/resources/sql-tests/inputs/group-by-ordinal.sql b/sql/core/src/test/resources/sql-tests/inputs/group-by-ordinal.sql index 36b469c617..9c8d851e36 100644 --- a/sql/core/src/test/resources/sql-tests/inputs/group-by-ordinal.sql +++ b/sql/core/src/test/resources/sql-tests/inputs/group-by-ordinal.sql @@ -43,6 +43,12 @@ select a, rand(0), sum(b) from data group by a, 2; -- negative case: star select * from data group by a, b, 1; +-- group by ordinal followed by order by +select a, count(a) from (select 1 as a) tmp group by 1 order by 1; + +-- group by ordinal followed by having +select count(a), a from (select 1 as a) tmp group by 2 having a > 0; + -- turn of group by ordinal set spark.sql.groupByOrdinal=false; diff --git a/sql/core/src/test/resources/sql-tests/results/group-by-ordinal.sql.out b/sql/core/src/test/resources/sql-tests/results/group-by-ordinal.sql.out index 2f10b7ebc6..9c3a145f3a 100644 --- a/sql/core/src/test/resources/sql-tests/results/group-by-ordinal.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/group-by-ordinal.sql.out @@ -1,5 +1,5 @@ -- Automatically generated by SQLQueryTestSuite --- Number of queries: 17 +-- Number of queries: 19 -- !query 0 @@ -153,16 +153,32 @@ Star (*) is not allowed in select list when GROUP BY ordinal position is used; -- !query 15 -set spark.sql.groupByOrdinal=false +select a, count(a) from (select 1 as a) tmp group by 1 order by 1 -- !query 15 schema -struct<key:string,value:string> +struct<a:int,count(a):bigint> -- !query 15 output -spark.sql.groupByOrdinal +1 1 -- !query 16 -select sum(b) from data group by -1 +select count(a), a from (select 1 as a) tmp group by 2 having a > 0 -- !query 16 schema -struct<sum(b):bigint> +struct<count(a):bigint,a:int> -- !query 16 output +1 1 + + +-- !query 17 +set spark.sql.groupByOrdinal=false +-- !query 17 schema +struct<key:string,value:string> +-- !query 17 output +spark.sql.groupByOrdinal + + +-- !query 18 +select sum(b) from data group by -1 +-- !query 18 schema +struct<sum(b):bigint> +-- !query 18 output 9 |