[SPARK-14275][SQL] Reimplement TypedAggregateExpression to DeclarativeAggregate - spark

diff options

author	Wenchen Fan <wenchen@databricks.com>	2016-04-15 12:10:00 +0800
committer	Wenchen Fan <wenchen@databricks.com>	2016-04-15 12:10:00 +0800
commit	297ba3f1b49cc37d9891a529142c553e0a5e2d62 (patch)
tree	2a61d490100de8b609a15fb52561524dddaca0e8 /python/pyspark/ml/regression.py
parent	b5c60bcdca3bcace607b204a6c196a5386e8a896 (diff)
download	spark-297ba3f1b49cc37d9891a529142c553e0a5e2d62.tar.gz spark-297ba3f1b49cc37d9891a529142c553e0a5e2d62.tar.bz2 spark-297ba3f1b49cc37d9891a529142c553e0a5e2d62.zip

[SPARK-14275][SQL] Reimplement TypedAggregateExpression to DeclarativeAggregate

## What changes were proposed in this pull request? `ExpressionEncoder` is just a container for serialization and deserialization expressions, we can use these expressions to build `TypedAggregateExpression` directly, so that it can fit in `DeclarativeAggregate`, which is more efficient. One trick is, for each buffer serializer expression, it will reference to the result object of serialization and function call. To avoid re-calculating this result object, we can serialize the buffer object to a single struct field, so that we can use a special `Expression` to only evaluate result object once. ## How was this patch tested? existing tests Author: Wenchen Fan <wenchen@databricks.com> Closes #12067 from cloud-fan/typed_udaf.

Diffstat (limited to 'python/pyspark/ml/regression.py')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: