[SPARK-4864] Add documentation to Netty-based configs

Author: Aaron Davidson <aaron@databricks.com> Closes #3713 from aarondav/netty-configs and squashes the following commits: 8a8b373 [Aaron Davidson] Address Patrick's comments 3b1f84e [Aaron Davidson] [SPARK-4864] Add documentation to Netty-based configs
author: Aaron Davidson <aaron@databricks.com> 2014-12-22 13:09:22 -0800
committer: Patrick Wendell <pwendell@gmail.com> 2014-12-22 13:09:22 -0800
commit: fbca6b6ce293b1997b40abeb9ab77b8a969a5fc9 (patch)
tree: e7ec5101ecb2d7fc5322c0c594183edac65afd2e /docs/configuration.md
parent: 7c0ed13d298d9cf66842c667602e2dccb8f5605b (diff)
download: spark-fbca6b6ce293b1997b40abeb9ab77b8a969a5fc9.tar.gz
spark-fbca6b6ce293b1997b40abeb9ab77b8a969a5fc9.tar.bz2
spark-fbca6b6ce293b1997b40abeb9ab77b8a969a5fc9.zip
1 files changed, 35 insertions, 0 deletions
diff --git a/docs/configuration.md b/docs/configuration.md
index 2c8dea869b..2cc013c47f 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -852,6 +852,41 @@ Apart from these, the following properties are also available, and may be useful
     between nodes leading to flooding the network with those.
   </td>
 </tr>
+<tr>
+  <td><code>spark.shuffle.io.preferDirectBufs</code></td>
+  <td>true</td>
+  <td>
+    (Netty only) Off-heap buffers are used to reduce garbage collection during shuffle and cache 
+    block transfer. For environments where off-heap memory is tightly limited, users may wish to 
+    turn this off to force all allocations from Netty to be on-heap.
+  </td>
+</tr>
+<tr>
+  <td><code>spark.shuffle.io.numConnectionsPerPeer</code></td>
+  <td>1</td>
+  <td>
+    (Netty only) Connections between hosts are reused in order to reduce connection buildup for 
+    large clusters. For clusters with many hard disks and few hosts, this may result in insufficient
+    concurrency to saturate all disks, and so users may consider increasing this value.
+  </td>
+</tr>
+<tr>
+  <td><code>spark.shuffle.io.maxRetries</code></td>
+  <td>3</td>
+  <td>
+    (Netty only) Fetches that fail due to IO-related exceptions are automatically retried if this is
+    set to a non-zero value. This retry logic helps stabilize large shuffles in the face of long GC 
+    pauses or transient network connectivity issues.
+  </td>
+</tr>
+<tr>
+  <td><code>spark.shuffle.io.retryWait</code></td>
+  <td>5</td>
+  <td>
+    (Netty only) Seconds to wait between retries of fetches. The maximum delay caused by retrying
+    is simply <code>maxRetries * retryWait</code>, by default 15 seconds. 
+  </td>
+</tr>
 </table>
 
 #### Scheduling
author	Aaron Davidson <aaron@databricks.com>	2014-12-22 13:09:22 -0800
committer	Patrick Wendell <pwendell@gmail.com>	2014-12-22 13:09:22 -0800
commit	fbca6b6ce293b1997b40abeb9ab77b8a969a5fc9 (patch)
tree	e7ec5101ecb2d7fc5322c0c594183edac65afd2e /docs/configuration.md
parent	7c0ed13d298d9cf66842c667602e2dccb8f5605b (diff)
download	spark-fbca6b6ce293b1997b40abeb9ab77b8a969a5fc9.tar.gz spark-fbca6b6ce293b1997b40abeb9ab77b8a969a5fc9.tar.bz2 spark-fbca6b6ce293b1997b40abeb9ab77b8a969a5fc9.zip