Address Patrick's and Reynold's comments

Aside from trivial formatting changes, use nulls instead of Options for DiskMapIterator, and add documentation for spark.shuffle.externalSorting and spark.shuffle.memoryFraction. Also, set spark.shuffle.memoryFraction to 0.3, and spark.storage.memoryFraction = 0.6.
author: Andrew Or <andrewor14@gmail.com> 2014-01-10 15:09:51 -0800
committer: Andrew Or <andrewor14@gmail.com> 2014-01-10 15:09:51 -0800
commit: e4c51d21135978908f7f4a46683f70ef98b720ec (patch)
tree: 19d25b7c647fd0791454d1965cf3201430ca24a9 /docs/configuration.md
parent: 372a533a6c091361115f0f0712e93ef3af376b30 (diff)
download: spark-e4c51d21135978908f7f4a46683f70ef98b720ec.tar.gz
spark-e4c51d21135978908f7f4a46683f70ef98b720ec.tar.bz2
spark-e4c51d21135978908f7f4a46683f70ef98b720ec.zip
1 files changed, 22 insertions, 2 deletions
diff --git a/docs/configuration.md b/docs/configuration.md
index 6717757781..c1158491f0 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -104,14 +104,25 @@ Apart from these, the following properties are also available, and may be useful
 </tr>
 <tr>
   <td>spark.storage.memoryFraction</td>
-  <td>0.66</td>
+  <td>0.6</td>
   <td>
     Fraction of Java heap to use for Spark's memory cache. This should not be larger than the "old"
-    generation of objects in the JVM, which by default is given 2/3 of the heap, but you can increase
+    generation of objects in the JVM, which by default is given 0.6 of the heap, but you can increase
     it if you configure your own old generation size.
   </td>
 </tr>
 <tr>
+  <td>spark.shuffle.memoryFraction</td>
+  <td>0.3</td>
+  <td>
+    Fraction of Java heap to use for aggregation and cogroups during shuffles, if
+    <code>spark.shuffle.externalSorting</code> is enabled. At any given time, the collective size of
+    all in-memory maps used for shuffles is bounded by this limit, beyond which the contents will
+    begin to spill to disk. If spills are often, consider increasing this value at the expense of
+    <code>spark.storage.memoryFraction</code>.
+  </td>
+</tr>
+<tr>
   <td>spark.mesos.coarse</td>
   <td>false</td>
   <td>
@@ -377,6 +388,15 @@ Apart from these, the following properties are also available, and may be useful
   </td>
 </tr>
 <tr>
+  <td>spark.shuffle.externalSorting</td>
+  <td>true</td>
+  <td>
+    If set to "true", spills in-memory maps used for shuffles to disk when a memory threshold is reached. This
+    threshold is specified by <code>spark.shuffle.memoryFraction</code>. Enable this especially for memory-intensive
+    applications.
+  </td>
+</tr>
+<tr>
   <td>spark.speculation</td>
   <td>false</td>
   <td>
author	Andrew Or <andrewor14@gmail.com>	2014-01-10 15:09:51 -0800
committer	Andrew Or <andrewor14@gmail.com>	2014-01-10 15:09:51 -0800
commit	e4c51d21135978908f7f4a46683f70ef98b720ec (patch)
tree	19d25b7c647fd0791454d1965cf3201430ca24a9 /docs/configuration.md
parent	372a533a6c091361115f0f0712e93ef3af376b30 (diff)
download	spark-e4c51d21135978908f7f4a46683f70ef98b720ec.tar.gz spark-e4c51d21135978908f7f4a46683f70ef98b720ec.tar.bz2 spark-e4c51d21135978908f7f4a46683f70ef98b720ec.zip