diff options
author | jiangxingbo <jiangxb1987@gmail.com> | 2016-11-01 11:25:11 -0700 |
---|---|---|
committer | Reynold Xin <rxin@databricks.com> | 2016-11-01 11:25:11 -0700 |
commit | d0272b436512b71f04313e109d3d21a6e9deefca (patch) | |
tree | 6e6e64f41ded2f8f6e3636626f185a9ba726a80d /dev | |
parent | 8a538c97b556f80f67c80519af0ce879557050d5 (diff) | |
download | spark-d0272b436512b71f04313e109d3d21a6e9deefca.tar.gz spark-d0272b436512b71f04313e109d3d21a6e9deefca.tar.bz2 spark-d0272b436512b71f04313e109d3d21a6e9deefca.zip |
[SPARK-18148][SQL] Misleading Error Message for Aggregation Without Window/GroupBy
## What changes were proposed in this pull request?
Aggregation Without Window/GroupBy expressions will fail in `checkAnalysis`, the error message is a bit misleading, we should generate a more specific error message for this case.
For example,
```
spark.read.load("/some-data")
.withColumn("date_dt", to_date($"date"))
.withColumn("year", year($"date_dt"))
.withColumn("week", weekofyear($"date_dt"))
.withColumn("user_count", count($"userId"))
.withColumn("daily_max_in_week", max($"user_count").over(weeklyWindow))
)
```
creates the following output:
```
org.apache.spark.sql.AnalysisException: expression '`randomColumn`' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;
```
In the error message above, `randomColumn` doesn't appear in the query(acturally it's added by function `withColumn`), so the message is not enough for the user to address the problem.
## How was this patch tested?
Manually test
Before:
```
scala> spark.sql("select col, count(col) from tbl")
org.apache.spark.sql.AnalysisException: expression 'tbl.`col`' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;;
```
After:
```
scala> spark.sql("select col, count(col) from tbl")
org.apache.spark.sql.AnalysisException: grouping expressions sequence is empty, and 'tbl.`col`' is not an aggregate function. Wrap '(count(col#231L) AS count(col)#239L)' in windowing function(s) or wrap 'tbl.`col`' in first() (or first_value) if you don't care which value you get.;;
```
Also add new test sqls in `group-by.sql`.
Author: jiangxingbo <jiangxb1987@gmail.com>
Closes #15672 from jiangxb1987/groupBy-empty.
Diffstat (limited to 'dev')
0 files changed, 0 insertions, 0 deletions