aboutsummaryrefslogtreecommitdiff
path: root/core
diff options
context:
space:
mode:
authorMichael Armbrust <michael@databricks.com>2016-01-14 17:44:56 -0800
committerMichael Armbrust <michael@databricks.com>2016-01-14 17:44:56 -0800
commitcc7af86afd3e769d1e2a581f31bb3db5a3d0229f (patch)
tree2fbd24829a347a0765a6882f98c87ac555aaa55b /core
parent25782981cf58946dc7c186acadd2beec5d964461 (diff)
downloadspark-cc7af86afd3e769d1e2a581f31bb3db5a3d0229f.tar.gz
spark-cc7af86afd3e769d1e2a581f31bb3db5a3d0229f.tar.bz2
spark-cc7af86afd3e769d1e2a581f31bb3db5a3d0229f.zip
[SPARK-12813][SQL] Eliminate serialization for back to back operations
The goal of this PR is to eliminate unnecessary translations when there are back-to-back `MapPartitions` operations. In order to achieve this I also made the following simplifications: - Operators no longer have hold encoders, instead they have only the expressions that they need. The benefits here are twofold: the expressions are visible to transformations so go through the normal resolution/binding process. now that they are visible we can change them on a case by case basis. - Operators no longer have type parameters. Since the engine is responsible for its own type checking, having the types visible to the complier was an unnecessary complication. We still leverage the scala compiler in the companion factory when constructing a new operator, but after this the types are discarded. Deferred to a follow up PR: - Remove as much of the resolution/binding from Dataset/GroupedDataset as possible. We should still eagerly check resolution and throw an error though in the case of mismatches for an `as` operation. - Eliminate serializations in more cases by adding more cases to `EliminateSerialization` Author: Michael Armbrust <michael@databricks.com> Closes #10747 from marmbrus/encoderExpressions.
Diffstat (limited to 'core')
0 files changed, 0 insertions, 0 deletions