diff options
author | Yin Huai <yhuai@databricks.com> | 2015-11-19 11:02:17 -0800 |
---|---|---|
committer | Yin Huai <yhuai@databricks.com> | 2015-11-19 11:02:17 -0800 |
commit | 962878843b611fa6229e3ee67bb22e2a4bc283cd (patch) | |
tree | 5c3ce747a78f6877017c85c59b886aa806dd65f8 /sql | |
parent | f449992009becc8f7c7f06cda522b9beaa1e263c (diff) | |
download | spark-962878843b611fa6229e3ee67bb22e2a4bc283cd.tar.gz spark-962878843b611fa6229e3ee67bb22e2a4bc283cd.tar.bz2 spark-962878843b611fa6229e3ee67bb22e2a4bc283cd.zip |
[SPARK-11840][SQL] Restore the 1.5's behavior of planning a single distinct aggregation.
The impact of this change is for a query that has a single distinct column and does not have any grouping expression like
`SELECT COUNT(DISTINCT a) FROM table`
The plan will be changed from
```
AGG-2 (count distinct)
Shuffle to a single reducer
Partial-AGG-2 (count distinct)
AGG-1 (grouping on a)
Shuffle by a
Partial-AGG-1 (grouping on 1)
```
to the following one (1.5 uses this)
```
AGG-2
AGG-1 (grouping on a)
Shuffle to a single reducer
Partial-AGG-1(grouping on a)
```
The first plan is more robust. However, to better benchmark the impact of this change, we should use 1.5's plan and use the conf of `spark.sql.specializeSingleDistinctAggPlanning` to control the plan.
Author: Yin Huai <yhuai@databricks.com>
Closes #9828 from yhuai/distinctRewriter.
Diffstat (limited to 'sql')
-rw-r--r-- | sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DistinctAggregationRewriter.scala | 4 |
1 files changed, 2 insertions, 2 deletions
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DistinctAggregationRewriter.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DistinctAggregationRewriter.scala index c0c960471a..9c78f6d4cc 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DistinctAggregationRewriter.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DistinctAggregationRewriter.scala @@ -126,8 +126,8 @@ case class DistinctAggregationRewriter(conf: CatalystConf) extends Rule[LogicalP val shouldRewrite = if (conf.specializeSingleDistinctAggPlanning) { // When the flag is set to specialize single distinct agg planning, // we will rely on our Aggregation strategy to handle queries with a single - // distinct column and this aggregate operator does have grouping expressions. - distinctAggGroups.size > 1 || (distinctAggGroups.size == 1 && a.groupingExpressions.isEmpty) + // distinct column. + distinctAggGroups.size > 1 } else { distinctAggGroups.size >= 1 } |