From e4c51d21135978908f7f4a46683f70ef98b720ec Mon Sep 17 00:00:00 2001 From: Andrew Or Date: Fri, 10 Jan 2014 15:09:51 -0800 Subject: Address Patrick's and Reynold's comments Aside from trivial formatting changes, use nulls instead of Options for DiskMapIterator, and add documentation for spark.shuffle.externalSorting and spark.shuffle.memoryFraction. Also, set spark.shuffle.memoryFraction to 0.3, and spark.storage.memoryFraction = 0.6. --- docs/configuration.md | 24 ++++++++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-) (limited to 'docs/configuration.md') diff --git a/docs/configuration.md b/docs/configuration.md index 6717757781..c1158491f0 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -104,13 +104,24 @@ Apart from these, the following properties are also available, and may be useful spark.storage.memoryFraction - 0.66 + 0.6 Fraction of Java heap to use for Spark's memory cache. This should not be larger than the "old" - generation of objects in the JVM, which by default is given 2/3 of the heap, but you can increase + generation of objects in the JVM, which by default is given 0.6 of the heap, but you can increase it if you configure your own old generation size. + + spark.shuffle.memoryFraction + 0.3 + + Fraction of Java heap to use for aggregation and cogroups during shuffles, if + spark.shuffle.externalSorting is enabled. At any given time, the collective size of + all in-memory maps used for shuffles is bounded by this limit, beyond which the contents will + begin to spill to disk. If spills are often, consider increasing this value at the expense of + spark.storage.memoryFraction. + + spark.mesos.coarse false @@ -376,6 +387,15 @@ Apart from these, the following properties are also available, and may be useful If set to "true", consolidates intermediate files created during a shuffle. Creating fewer files can improve filesystem performance for shuffles with large numbers of reduce tasks. It is recommended to set this to "true" when using ext4 or xfs filesystems. On ext3, this option might degrade performance on machines with many (>8) cores due to filesystem limitations. + + spark.shuffle.externalSorting + true + + If set to "true", spills in-memory maps used for shuffles to disk when a memory threshold is reached. This + threshold is specified by spark.shuffle.memoryFraction. Enable this especially for memory-intensive + applications. + + spark.speculation false -- cgit v1.2.3 From 2e393cd5fdfbf3a85fced370b5c42315e86dad49 Mon Sep 17 00:00:00 2001 From: Andrew Or Date: Fri, 10 Jan 2014 15:45:38 -0800 Subject: Update documentation for externalSorting --- docs/configuration.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'docs/configuration.md') diff --git a/docs/configuration.md b/docs/configuration.md index c1158491f0..40a57c4bc6 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -391,9 +391,8 @@ Apart from these, the following properties are also available, and may be useful spark.shuffle.externalSorting true - If set to "true", spills in-memory maps used for shuffles to disk when a memory threshold is reached. This - threshold is specified by spark.shuffle.memoryFraction. Enable this especially for memory-intensive - applications. + If set to "true", limits the amount of memory used during reduces by spilling data out to disk. This spilling + threshold is specified by spark.shuffle.memoryFraction. -- cgit v1.2.3