From f25bbbdb3ac5620850c7d09d6a63af888411ecf1 Mon Sep 17 00:00:00 2001 From: Reynold Xin Date: Sun, 7 Sep 2014 20:42:07 -0700 Subject: [SPARK-3280] Made sort-based shuffle the default implementation Sort-based shuffle has lower memory usage and seems to outperform hash-based in almost all of our testing. Author: Reynold Xin Closes #2178 from rxin/sort-shuffle and squashes the following commits: 713d341 [Reynold Xin] Fixed test failures by setting spark.shuffle.compress to the same value as spark.shuffle.spill.compress. 85165e6 [Reynold Xin] Fixed a comment typo. aa0d372 [Reynold Xin] [SPARK-3280] Made sort-based shuffle the default implementation --- docs/configuration.md | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) (limited to 'docs/configuration.md') diff --git a/docs/configuration.md b/docs/configuration.md index 65a422caab..36178efb97 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -293,12 +293,11 @@ Apart from these, the following properties are also available, and may be useful spark.shuffle.manager - HASH + sort - Implementation to use for shuffling data. A hash-based shuffle manager is the default, but - starting in Spark 1.1 there is an experimental sort-based shuffle manager that is more - memory-efficient in environments with small executors, such as YARN. To use that, change - this value to SORT. + Implementation to use for shuffling data. There are two implementations available: + sort and hash. Sort-based shuffle is more memory-efficient and is + the default option starting in 1.2. -- cgit v1.2.3