diff options
author | Josh Rosen <joshrosen@databricks.com> | 2015-05-06 10:52:55 -0700 |
---|---|---|
committer | Josh Rosen <joshrosen@databricks.com> | 2015-05-06 10:52:55 -0700 |
commit | 002c12384d6ecebbb3e7fc853dbdfbc5aaa3d6a6 (patch) | |
tree | 6075be8b83ccc0249fd35e8eaad6d516ed801a4b /sql | |
parent | f2c47082c3412a4cf8cbabe12585147c5ec3ea40 (diff) | |
download | spark-002c12384d6ecebbb3e7fc853dbdfbc5aaa3d6a6.tar.gz spark-002c12384d6ecebbb3e7fc853dbdfbc5aaa3d6a6.tar.bz2 spark-002c12384d6ecebbb3e7fc853dbdfbc5aaa3d6a6.zip |
[SPARK-7311] Introduce internal Serializer API for determining if serializers support object relocation
This patch extends the `Serializer` interface with a new `Private` API which allows serializers to indicate whether they support relocation of serialized objects in serializer stream output.
This relocatibilty property is described in more detail in `Serializer.scala`, but in a nutshell a serializer supports relocation if reordering the bytes of serialized objects in serialization stream output is equivalent to having re-ordered those elements prior to serializing them. The optimized shuffle path introduced in #4450 and #5868 both rely on serializers having this property; this patch just centralizes the logic for determining whether a serializer has this property. I also added tests and comments clarifying when this works for KryoSerializer.
This change allows the optimizations in #4450 to be applied for shuffles that use `SqlSerializer2`.
Author: Josh Rosen <joshrosen@databricks.com>
Closes #5924 from JoshRosen/SPARK-7311 and squashes the following commits:
50a68ca [Josh Rosen] Address minor nits
0a7ebd7 [Josh Rosen] Clarify reason why SqlSerializer2 supports this serializer
123b992 [Josh Rosen] Cleanup for submitting as standalone patch.
4aa61b2 [Josh Rosen] Add missing newline
2c1233a [Josh Rosen] Small refactoring of SerializerPropertiesSuite to enable test re-use:
0ba75e6 [Josh Rosen] Add tests for serializer relocation property.
450fa21 [Josh Rosen] Back out accidental log4j.properties change
86d4dcd [Josh Rosen] Flag that SparkSqlSerializer2 supports relocation
b9624ee [Josh Rosen] Expand serializer API and use new function to help control when new UnsafeShuffle path is used.
Diffstat (limited to 'sql')
-rw-r--r-- | sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlSerializer2.scala | 5 |
1 files changed, 5 insertions, 0 deletions
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlSerializer2.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlSerializer2.scala index 9552f41115..35ad987eb1 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlSerializer2.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlSerializer2.scala @@ -154,6 +154,11 @@ private[sql] class SparkSqlSerializer2(keySchema: Array[DataType], valueSchema: with Serializable{ def newInstance(): SerializerInstance = new ShuffleSerializerInstance(keySchema, valueSchema) + + override def supportsRelocationOfSerializedObjects: Boolean = { + // SparkSqlSerializer2 is stateless and writes no stream headers + true + } } private[sql] object SparkSqlSerializer2 { |