diff options
Diffstat (limited to 'docs/configuration.md')
-rw-r--r-- | docs/configuration.md | 23 |
1 files changed, 21 insertions, 2 deletions
diff --git a/docs/configuration.md b/docs/configuration.md index b1a0e19167..ad75e06fc7 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -104,14 +104,25 @@ Apart from these, the following properties are also available, and may be useful </tr> <tr> <td>spark.storage.memoryFraction</td> - <td>0.66</td> + <td>0.6</td> <td> Fraction of Java heap to use for Spark's memory cache. This should not be larger than the "old" - generation of objects in the JVM, which by default is given 2/3 of the heap, but you can increase + generation of objects in the JVM, which by default is given 0.6 of the heap, but you can increase it if you configure your own old generation size. </td> </tr> <tr> + <td>spark.shuffle.memoryFraction</td> + <td>0.3</td> + <td> + Fraction of Java heap to use for aggregation and cogroups during shuffles, if + <code>spark.shuffle.externalSorting</code> is enabled. At any given time, the collective size of + all in-memory maps used for shuffles is bounded by this limit, beyond which the contents will + begin to spill to disk. If spills are often, consider increasing this value at the expense of + <code>spark.storage.memoryFraction</code>. + </td> +</tr> +<tr> <td>spark.mesos.coarse</td> <td>false</td> <td> @@ -377,6 +388,14 @@ Apart from these, the following properties are also available, and may be useful </td> </tr> <tr> + <td>spark.shuffle.externalSorting</td> + <td>true</td> + <td> + If set to "true", limits the amount of memory used during reduces by spilling data out to disk. This spilling + threshold is specified by <code>spark.shuffle.memoryFraction</code>. + </td> +</tr> +<tr> <td>spark.speculation</td> <td>false</td> <td> |