[SPARK-12813][SQL] Eliminate serialization for back to back operations - spark

diff options

author	Michael Armbrust <michael@databricks.com>	2016-01-14 17:44:56 -0800
committer	Michael Armbrust <michael@databricks.com>	2016-01-14 17:44:56 -0800
commit	cc7af86afd3e769d1e2a581f31bb3db5a3d0229f (patch)
tree	2fbd24829a347a0765a6882f98c87ac555aaa55b /core
parent	25782981cf58946dc7c186acadd2beec5d964461 (diff)
download	spark-cc7af86afd3e769d1e2a581f31bb3db5a3d0229f.tar.gz spark-cc7af86afd3e769d1e2a581f31bb3db5a3d0229f.tar.bz2 spark-cc7af86afd3e769d1e2a581f31bb3db5a3d0229f.zip

[SPARK-12813][SQL] Eliminate serialization for back to back operations

The goal of this PR is to eliminate unnecessary translations when there are back-to-back `MapPartitions` operations. In order to achieve this I also made the following simplifications: - Operators no longer have hold encoders, instead they have only the expressions that they need. The benefits here are twofold: the expressions are visible to transformations so go through the normal resolution/binding process. now that they are visible we can change them on a case by case basis. - Operators no longer have type parameters. Since the engine is responsible for its own type checking, having the types visible to the complier was an unnecessary complication. We still leverage the scala compiler in the companion factory when constructing a new operator, but after this the types are discarded. Deferred to a follow up PR: - Remove as much of the resolution/binding from Dataset/GroupedDataset as possible. We should still eagerly check resolution and throw an error though in the case of mismatches for an `as` operation. - Eliminate serializations in more cases by adding more cases to `EliminateSerialization` Author: Michael Armbrust <michael@databricks.com> Closes #10747 from marmbrus/encoderExpressions.

Diffstat (limited to 'core')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: