[SPARK-18148][SQL] Misleading Error Message for Aggregation Without Window/GroupBy - spark

diff options

author	jiangxingbo <jiangxb1987@gmail.com>	2016-11-01 11:25:11 -0700
committer	Reynold Xin <rxin@databricks.com>	2016-11-01 11:25:11 -0700
commit	d0272b436512b71f04313e109d3d21a6e9deefca (patch)
tree	6e6e64f41ded2f8f6e3636626f185a9ba726a80d /dev
parent	8a538c97b556f80f67c80519af0ce879557050d5 (diff)
download	spark-d0272b436512b71f04313e109d3d21a6e9deefca.tar.gz spark-d0272b436512b71f04313e109d3d21a6e9deefca.tar.bz2 spark-d0272b436512b71f04313e109d3d21a6e9deefca.zip

[SPARK-18148][SQL] Misleading Error Message for Aggregation Without Window/GroupBy

## What changes were proposed in this pull request? Aggregation Without Window/GroupBy expressions will fail in `checkAnalysis`, the error message is a bit misleading, we should generate a more specific error message for this case. For example, ``` spark.read.load("/some-data") .withColumn("date_dt", to_date($"date")) .withColumn("year", year($"date_dt")) .withColumn("week", weekofyear($"date_dt")) .withColumn("user_count", count($"userId")) .withColumn("daily_max_in_week", max($"user_count").over(weeklyWindow)) ) ``` creates the following output: ``` org.apache.spark.sql.AnalysisException: expression '`randomColumn`' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.; ``` In the error message above, `randomColumn` doesn't appear in the query(acturally it's added by function `withColumn`), so the message is not enough for the user to address the problem. ## How was this patch tested? Manually test Before: ``` scala> spark.sql("select col, count(col) from tbl") org.apache.spark.sql.AnalysisException: expression 'tbl.`col`' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;; ``` After: ``` scala> spark.sql("select col, count(col) from tbl") org.apache.spark.sql.AnalysisException: grouping expressions sequence is empty, and 'tbl.`col`' is not an aggregate function. Wrap '(count(col#231L) AS count(col)#239L)' in windowing function(s) or wrap 'tbl.`col`' in first() (or first_value) if you don't care which value you get.;; ``` Also add new test sqls in `group-by.sql`. Author: jiangxingbo <jiangxb1987@gmail.com> Closes #15672 from jiangxb1987/groupBy-empty.

Diffstat (limited to 'dev')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: