aboutsummaryrefslogtreecommitdiff
path: root/yarn
diff options
context:
space:
mode:
authorReynold Xin <rxin@databricks.com>2016-05-28 14:14:36 -0700
committerYin Huai <yhuai@databricks.com>2016-05-28 14:14:36 -0700
commit472f16181d199684996a156b0e429bc525d65a57 (patch)
treeebf48b3faa627db5da3599aaf205930fb47fd1e9 /yarn
parent74c1b79f3f82751d166bccba877501a8cabc9b7c (diff)
downloadspark-472f16181d199684996a156b0e429bc525d65a57.tar.gz
spark-472f16181d199684996a156b0e429bc525d65a57.tar.bz2
spark-472f16181d199684996a156b0e429bc525d65a57.zip
[SPARK-15636][SQL] Make aggregate expressions more concise in explain
## What changes were proposed in this pull request? This patch reduces the verbosity of aggregate expressions in explain (but does not actually remove any information). As an example, for the following command: ``` spark.range(10).selectExpr("sum(id) + 1", "count(distinct id)").explain(true) ``` Output before this patch: ``` == Physical Plan == *TungstenAggregate(key=[], functions=[(sum(id#0L),mode=Final,isDistinct=false),(count(id#0L),mode=Final,isDistinct=true)], output=[(sum(id) + 1)#3L,count(DISTINCT id)#16L]) +- Exchange SinglePartition, None +- *TungstenAggregate(key=[], functions=[(sum(id#0L),mode=PartialMerge,isDistinct=false),(count(id#0L),mode=Partial,isDistinct=true)], output=[sum#18L,count#21L]) +- *TungstenAggregate(key=[id#0L], functions=[(sum(id#0L),mode=PartialMerge,isDistinct=false)], output=[id#0L,sum#18L]) +- Exchange hashpartitioning(id#0L, 5), None +- *TungstenAggregate(key=[id#0L], functions=[(sum(id#0L),mode=Partial,isDistinct=false)], output=[id#0L,sum#18L]) +- *Range (0, 10, splits=2) ``` Output after this patch: ``` == Physical Plan == *TungstenAggregate(key=[], functions=[sum(id#0L),count(distinct id#0L)], output=[(sum(id) + 1)#3L,count(DISTINCT id)#16L]) +- Exchange SinglePartition, None +- *TungstenAggregate(key=[], functions=[merge_sum(id#0L),partial_count(distinct id#0L)], output=[sum#18L,count#21L]) +- *TungstenAggregate(key=[id#0L], functions=[merge_sum(id#0L)], output=[id#0L,sum#18L]) +- Exchange hashpartitioning(id#0L, 5), None +- *TungstenAggregate(key=[id#0L], functions=[partial_sum(id#0L)], output=[id#0L,sum#18L]) +- *Range (0, 10, splits=2) ``` Note the change from `(sum(id#0L),mode=PartialMerge,isDistinct=false)` to `merge_sum(id#0L)`. In general aggregate explain is still very verbose, but further work will be done as follow-up pull requests. ## How was this patch tested? Tested manually. Author: Reynold Xin <rxin@databricks.com> Closes #13367 from rxin/SPARK-15636.
Diffstat (limited to 'yarn')
0 files changed, 0 insertions, 0 deletions