aboutsummaryrefslogtreecommitdiff
path: root/python
diff options
context:
space:
mode:
authorgatorsmile <gatorsmile@gmail.com>2016-03-05 19:25:03 +0800
committerCheng Lian <lian@databricks.com>2016-03-05 19:25:03 +0800
commitadce5ee721c6a844ff21dfcd8515859458fe611d (patch)
tree31df1bcf77e6248e627c788dd31c52be127ad72e /python
parentf19228eed89cf8e22a07a7ef7f37a5f6f8a3d455 (diff)
downloadspark-adce5ee721c6a844ff21dfcd8515859458fe611d.tar.gz
spark-adce5ee721c6a844ff21dfcd8515859458fe611d.tar.bz2
spark-adce5ee721c6a844ff21dfcd8515859458fe611d.zip
[SPARK-12720][SQL] SQL Generation Support for Cube, Rollup, and Grouping Sets
#### What changes were proposed in this pull request? This PR is for supporting SQL generation for cube, rollup and grouping sets. For example, a query using rollup: ```SQL SELECT count(*) as cnt, key % 5, grouping_id() FROM t1 GROUP BY key % 5 WITH ROLLUP ``` Original logical plan: ``` Aggregate [(key#17L % cast(5 as bigint))#47L,grouping__id#46], [(count(1),mode=Complete,isDistinct=false) AS cnt#43L, (key#17L % cast(5 as bigint))#47L AS _c1#45L, grouping__id#46 AS _c2#44] +- Expand [List(key#17L, value#18, (key#17L % cast(5 as bigint))#47L, 0), List(key#17L, value#18, null, 1)], [key#17L,value#18,(key#17L % cast(5 as bigint))#47L,grouping__id#46] +- Project [key#17L, value#18, (key#17L % cast(5 as bigint)) AS (key#17L % cast(5 as bigint))#47L] +- Subquery t1 +- Relation[key#17L,value#18] ParquetRelation ``` Converted SQL: ```SQL SELECT count( 1) AS `cnt`, (`t1`.`key` % CAST(5 AS BIGINT)), grouping_id() AS `_c2` FROM `default`.`t1` GROUP BY (`t1`.`key` % CAST(5 AS BIGINT)) GROUPING SETS (((`t1`.`key` % CAST(5 AS BIGINT))), ()) ``` #### How was the this patch tested? Added eight test cases in `LogicalPlanToSQLSuite`. Author: gatorsmile <gatorsmile@gmail.com> Author: xiaoli <lixiao1983@gmail.com> Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local> Closes #11283 from gatorsmile/groupingSetsToSQL.
Diffstat (limited to 'python')
-rw-r--r--python/pyspark/sql/functions.py14
1 files changed, 7 insertions, 7 deletions
diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index 92e724fef4..88924e2981 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -348,13 +348,13 @@ def grouping_id(*cols):
grouping columns).
>>> df.cube("name").agg(grouping_id(), sum("age")).orderBy("name").show()
- +-----+------------+--------+
- | name|groupingid()|sum(age)|
- +-----+------------+--------+
- | null| 1| 7|
- |Alice| 0| 2|
- | Bob| 0| 5|
- +-----+------------+--------+
+ +-----+-------------+--------+
+ | name|grouping_id()|sum(age)|
+ +-----+-------------+--------+
+ | null| 1| 7|
+ |Alice| 0| 2|
+ | Bob| 0| 5|
+ +-----+-------------+--------+
"""
sc = SparkContext._active_spark_context
jc = sc._jvm.functions.grouping_id(_to_seq(sc, cols, _to_java_column))