From a76bde9dae54c4641e21f3c1ceb4870e3dc91881 Mon Sep 17 00:00:00 2001 From: Holden Karau Date: Thu, 10 Sep 2015 11:49:53 -0700 Subject: [SPARK-10469] [DOC] Try and document the three options From JIRA: Add documentation for tungsten-sort. From the mailing list "I saw a new "spark.shuffle.manager=tungsten-sort" implemented in https://issues.apache.org/jira/browse/SPARK-7081, but it can't be found its corresponding description in http://people.apache.org/~pwendell/spark-releases/spark-1.5.0-rc3-docs/configuration.html(Currenlty there are only 'sort' and 'hash' two options)." Author: Holden Karau Closes #8638 from holdenk/SPARK-10469-document-tungsten-sort. --- docs/configuration.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/docs/configuration.md b/docs/configuration.md index e287591f3f..0b1a273916 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -447,9 +447,12 @@ Apart from these, the following properties are also available, and may be useful spark.shuffle.manager sort - Implementation to use for shuffling data. There are two implementations available: - sort and hash. Sort-based shuffle is more memory-efficient and is - the default option starting in 1.2. + Implementation to use for shuffling data. There are three implementations available: + sort, hash and the new (1.5+) tungsten-sort. + Sort-based shuffle is more memory-efficient and is the default option starting in 1.2. + Tungsten-sort is similar to the sort based shuffle, with a direct binary cache-friendly + implementation with a fall back to regular sort based shuffle if its requirements are not + met. -- cgit v1.2.3