aboutsummaryrefslogtreecommitdiff
path: root/docs/configuration.md
diff options
context:
space:
mode:
authorPatrick Wendell <pwendell@gmail.com>2014-01-13 11:30:09 -0800
committerPatrick Wendell <pwendell@gmail.com>2014-01-13 12:21:39 -0800
commit5d61e051c2ad5955f0101de6f0ecdf5d243e4f5e (patch)
treeaafb1bb428b084228edc90962e1247edb4ae3255 /docs/configuration.md
parente6ed13f255d70de422711b979447690cdab7423b (diff)
downloadspark-5d61e051c2ad5955f0101de6f0ecdf5d243e4f5e.tar.gz
spark-5d61e051c2ad5955f0101de6f0ecdf5d243e4f5e.tar.bz2
spark-5d61e051c2ad5955f0101de6f0ecdf5d243e4f5e.zip
Improvements to external sorting
1. Adds the option of compressing outputs. 2. Adds batching to the serialization to prevent OOM on the read side. 3. Slight renaming of config options. 4. Use Spark's buffer size for reads in addition to writes.
Diffstat (limited to 'docs/configuration.md')
-rw-r--r--docs/configuration.md11
1 files changed, 9 insertions, 2 deletions
diff --git a/docs/configuration.md b/docs/configuration.md
index 40a57c4bc6..350e3145c0 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -116,7 +116,7 @@ Apart from these, the following properties are also available, and may be useful
<td>0.3</td>
<td>
Fraction of Java heap to use for aggregation and cogroups during shuffles, if
- <code>spark.shuffle.externalSorting</code> is enabled. At any given time, the collective size of
+ <code>spark.shuffle.external</code> is true. At any given time, the collective size of
all in-memory maps used for shuffles is bounded by this limit, beyond which the contents will
begin to spill to disk. If spills are often, consider increasing this value at the expense of
<code>spark.storage.memoryFraction</code>.
@@ -155,6 +155,13 @@ Apart from these, the following properties are also available, and may be useful
</td>
</tr>
<tr>
+ <td>spark.shuffle.external.compress</td>
+ <td>false</td>
+ <td>
+ Whether to compress data spilled during shuffles.
+ </td>
+</tr>
+<tr>
<td>spark.broadcast.compress</td>
<td>true</td>
<td>
@@ -388,7 +395,7 @@ Apart from these, the following properties are also available, and may be useful
</td>
</tr>
<tr>
- <td>spark.shuffle.externalSorting</td>
+ <td>spark.shuffle.external</td>
<td>true</td>
<td>
If set to "true", limits the amount of memory used during reduces by spilling data out to disk. This spilling