From f728e0fe7e860fe6dd3437e248472a67a2d435f8 Mon Sep 17 00:00:00 2001 From: Cheng Hao Date: Thu, 18 Dec 2014 18:58:29 -0800 Subject: [SPARK-2663] [SQL] Support the Grouping Set Add support for `GROUPING SETS`, `ROLLUP`, `CUBE` and the the virtual column `GROUPING__ID`. More details on how to use the `GROUPING SETS" can be found at: https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation,+Cube,+Grouping+and+Rollup https://issues.apache.org/jira/secure/attachment/12676811/grouping_set.pdf The generic idea of the implementations are : 1 Replace the `ROLLUP`, `CUBE` with `GROUPING SETS` 2 Explode each of the input row, and then feed them to `Aggregate` * Each grouping set are represented as the bit mask for the `GroupBy Expression List`, for each bit, `1` means the expression is selected, otherwise `0` (left is the lower bit, and right is the higher bit in the `GroupBy Expression List`) * Several of projections are constructed according to the grouping sets, and within each projection(Seq[Expression), we replace those expressions with `Literal(null)` if it's not selected in the grouping set (based on the bit mask) * Output Schema of `Explode` is `child.output :+ grouping__id` * GroupBy Expressions of `Aggregate` is `GroupBy Expression List :+ grouping__id` * Keep the `Aggregation expressions` the same for the `Aggregate` The expressions substitutions happen in Logic Plan analyzing, so we will benefit from the Logical Plan optimization (e.g. expression constant folding, and map side aggregation etc.), Only an `Explosive` operator added for Physical Plan, which will explode the rows according the pre-set projections. A known issue will be done in the follow up PR: * Optimization `ColumnPruning` is not supported yet for `Explosive` node. Author: Cheng Hao Closes #1567 from chenghao-intel/grouping_sets and squashes the following commits: fe65fcc [Cheng Hao] Remove the extra space 3547056 [Cheng Hao] Add more doc and Simplify the Expand a7c869d [Cheng Hao] update code as feedbacks d23c672 [Cheng Hao] Add GroupingExpression to replace the Seq[Expression] 414b165 [Cheng Hao] revert the unnecessary changes ec276c6 [Cheng Hao] Support Rollup/Cube/GroupingSets --- ...pby_grouping_sets3-6-aec59088408cc57248851d3ce04e2eef | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) create mode 100644 sql/hive/src/test/resources/golden/groupby_grouping_sets3-6-aec59088408cc57248851d3ce04e2eef (limited to 'sql/hive/src/test/resources/golden/groupby_grouping_sets3-6-aec59088408cc57248851d3ce04e2eef') diff --git a/sql/hive/src/test/resources/golden/groupby_grouping_sets3-6-aec59088408cc57248851d3ce04e2eef b/sql/hive/src/test/resources/golden/groupby_grouping_sets3-6-aec59088408cc57248851d3ce04e2eef new file mode 100644 index 0000000000..b2d08949e9 --- /dev/null +++ b/sql/hive/src/test/resources/golden/groupby_grouping_sets3-6-aec59088408cc57248851d3ce04e2eef @@ -0,0 +1,16 @@ +NULL NULL 3.8333333333333335 12 +NULL 1 2.0 5 +NULL 2 5.2 5 +NULL 3 5.0 2 +1 NULL 2.6666666666666665 3 +1 1 3.0 2 +1 2 2.0 1 +2 NULL 5.2 5 +2 2 5.333333333333333 3 +2 3 5.0 2 +3 NULL 8.0 1 +3 2 8.0 1 +5 NULL 2.0 1 +5 1 2.0 1 +8 NULL 1.0 2 +8 1 1.0 2 -- cgit v1.2.3