aboutsummaryrefslogtreecommitdiff
path: root/conf
diff options
context:
space:
mode:
authorJosh Rosen <joshrosen@databricks.com>2015-10-08 14:53:21 -0700
committerYin Huai <yhuai@databricks.com>2015-10-08 14:56:27 -0700
commit2816c89b6a304cb0b5214e14ebbc320158e88260 (patch)
tree41adbf5368a298b0744a33d11588138b98bae5cb /conf
parent9e66a53c9955285a85c19f55c3ef62db2e1b868a (diff)
downloadspark-2816c89b6a304cb0b5214e14ebbc320158e88260.tar.gz
spark-2816c89b6a304cb0b5214e14ebbc320158e88260.tar.bz2
spark-2816c89b6a304cb0b5214e14ebbc320158e88260.zip
[SPARK-10988] [SQL] Reduce duplication in Aggregate2's expression rewriting logic
In `aggregate/utils.scala`, there is a substantial amount of duplication in the expression-rewriting logic. As a prerequisite to supporting imperative aggregate functions in `TungstenAggregate`, this patch refactors this file so that the same expression-rewriting logic is used for both `SortAggregate` and `TungstenAggregate`. In order to allow both operators to use the same rewriting logic, `TungstenAggregationIterator. generateResultProjection()` has been updated so that it first evaluates all declarative aggregate functions' `evaluateExpression`s and writes the results into a temporary buffer, and then uses this temporary buffer and the grouping expressions to evaluate the final resultExpressions. This matches the logic in SortAggregateIterator, where this two-pass approach is necessary in order to support imperative aggregates. If this change turns out to cause performance regressions, then we can look into re-implementing the single-pass evaluation in a cleaner way as part of a followup patch. Since the rewriting logic is now shared across both operators, this patch also extracts that logic and places it in `SparkStrategies`. This makes the rewriting logic a bit easier to follow, I think. Author: Josh Rosen <joshrosen@databricks.com> Closes #9015 from JoshRosen/SPARK-10988.
Diffstat (limited to 'conf')
0 files changed, 0 insertions, 0 deletions