diff options
author | Wenchen Fan <wenchen@databricks.com> | 2016-04-15 12:10:00 +0800 |
---|---|---|
committer | Wenchen Fan <wenchen@databricks.com> | 2016-04-15 12:10:00 +0800 |
commit | 297ba3f1b49cc37d9891a529142c553e0a5e2d62 (patch) | |
tree | 2a61d490100de8b609a15fb52561524dddaca0e8 /python/pyspark/ml/regression.py | |
parent | b5c60bcdca3bcace607b204a6c196a5386e8a896 (diff) | |
download | spark-297ba3f1b49cc37d9891a529142c553e0a5e2d62.tar.gz spark-297ba3f1b49cc37d9891a529142c553e0a5e2d62.tar.bz2 spark-297ba3f1b49cc37d9891a529142c553e0a5e2d62.zip |
[SPARK-14275][SQL] Reimplement TypedAggregateExpression to DeclarativeAggregate
## What changes were proposed in this pull request?
`ExpressionEncoder` is just a container for serialization and deserialization expressions, we can use these expressions to build `TypedAggregateExpression` directly, so that it can fit in `DeclarativeAggregate`, which is more efficient.
One trick is, for each buffer serializer expression, it will reference to the result object of serialization and function call. To avoid re-calculating this result object, we can serialize the buffer object to a single struct field, so that we can use a special `Expression` to only evaluate result object once.
## How was this patch tested?
existing tests
Author: Wenchen Fan <wenchen@databricks.com>
Closes #12067 from cloud-fan/typed_udaf.
Diffstat (limited to 'python/pyspark/ml/regression.py')
0 files changed, 0 insertions, 0 deletions