[SPARK-11840][SQL] Restore the 1.5's behavior of planning a single distinct aggregation. - spark

diff options

author	Yin Huai <yhuai@databricks.com>	2015-11-19 11:02:17 -0800
committer	Yin Huai <yhuai@databricks.com>	2015-11-19 11:02:17 -0800
commit	962878843b611fa6229e3ee67bb22e2a4bc283cd (patch)
tree	5c3ce747a78f6877017c85c59b886aa806dd65f8 /network
parent	f449992009becc8f7c7f06cda522b9beaa1e263c (diff)
download	spark-962878843b611fa6229e3ee67bb22e2a4bc283cd.tar.gz spark-962878843b611fa6229e3ee67bb22e2a4bc283cd.tar.bz2 spark-962878843b611fa6229e3ee67bb22e2a4bc283cd.zip

[SPARK-11840][SQL] Restore the 1.5's behavior of planning a single distinct aggregation.

The impact of this change is for a query that has a single distinct column and does not have any grouping expression like `SELECT COUNT(DISTINCT a) FROM table` The plan will be changed from ``` AGG-2 (count distinct) Shuffle to a single reducer Partial-AGG-2 (count distinct) AGG-1 (grouping on a) Shuffle by a Partial-AGG-1 (grouping on 1) ``` to the following one (1.5 uses this) ``` AGG-2 AGG-1 (grouping on a) Shuffle to a single reducer Partial-AGG-1(grouping on a) ``` The first plan is more robust. However, to better benchmark the impact of this change, we should use 1.5's plan and use the conf of `spark.sql.specializeSingleDistinctAggPlanning` to control the plan. Author: Yin Huai <yhuai@databricks.com> Closes #9828 from yhuai/distinctRewriter.

Diffstat (limited to 'network')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: