aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorDongjoon Hyun <dongjoon@apache.org>2016-09-30 22:05:59 -0700
committerReynold Xin <rxin@databricks.com>2016-09-30 22:05:59 -0700
commit15e9bbb49e00b3982c428d39776725d0dea2cdfa (patch)
treebcdb2292b377361c612b829aaae31975b007910c
parentaef506e39a41cfe7198162c324a11ef2f01136c3 (diff)
downloadspark-15e9bbb49e00b3982c428d39776725d0dea2cdfa.tar.gz
spark-15e9bbb49e00b3982c428d39776725d0dea2cdfa.tar.bz2
spark-15e9bbb49e00b3982c428d39776725d0dea2cdfa.zip
[MINOR][DOC] Add an up-to-date description for default serialization during shuffling
## What changes were proposed in this pull request? This PR aims to make the doc up-to-date. The documentation is generally correct, but after https://issues.apache.org/jira/browse/SPARK-13926, Spark starts to choose Kyro as a default serialization library during shuffling of simple types, arrays of simple types, or string type. ## How was this patch tested? This is a documentation update. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #15315 from dongjoon-hyun/SPARK-DOC-SERIALIZER.
-rw-r--r--docs/tuning.md1
1 files changed, 1 insertions, 0 deletions
diff --git a/docs/tuning.md b/docs/tuning.md
index cbf37213aa..9c43b315bb 100644
--- a/docs/tuning.md
+++ b/docs/tuning.md
@@ -45,6 +45,7 @@ and calling `conf.set("spark.serializer", "org.apache.spark.serializer.KryoSeria
This setting configures the serializer used for not only shuffling data between worker
nodes but also when serializing RDDs to disk. The only reason Kryo is not the default is because of the custom
registration requirement, but we recommend trying it in any network-intensive application.
+Since Spark 2.0.0, we internally use Kryo serializer when shuffling RDDs with simple types, arrays of simple types, or string type.
Spark automatically includes Kryo serializers for the many commonly-used core Scala classes covered
in the AllScalaRegistrar from the [Twitter chill](https://github.com/twitter/chill) library.