aboutsummaryrefslogtreecommitdiff
path: root/docs/tuning.md
diff options
context:
space:
mode:
authorSean Owen <sowen@cloudera.com>2016-06-16 23:04:10 +0200
committerSean Owen <sowen@cloudera.com>2016-06-16 23:04:10 +0200
commit457126e420e66228cc68def4bc3d87e7a282069a (patch)
tree70a86161312c672cd7d85cedcd3aaa281d49c1f7 /docs/tuning.md
parent36110a8306608186696c536028d2776e022d305a (diff)
downloadspark-457126e420e66228cc68def4bc3d87e7a282069a.tar.gz
spark-457126e420e66228cc68def4bc3d87e7a282069a.tar.bz2
spark-457126e420e66228cc68def4bc3d87e7a282069a.zip
[SPARK-15796][CORE] Reduce spark.memory.fraction default to avoid overrunning old gen in JVM default config
## What changes were proposed in this pull request? Reduce `spark.memory.fraction` default to 0.6 in order to make it fit within default JVM old generation size (2/3 heap). See JIRA discussion. This means a full cache doesn't spill into the new gen. CC andrewor14 ## How was this patch tested? Jenkins tests. Author: Sean Owen <sowen@cloudera.com> Closes #13618 from srowen/SPARK-15796.
Diffstat (limited to 'docs/tuning.md')
-rw-r--r--docs/tuning.md18
1 files changed, 17 insertions, 1 deletions
diff --git a/docs/tuning.md b/docs/tuning.md
index e73ed69ffb..1ed14091c0 100644
--- a/docs/tuning.md
+++ b/docs/tuning.md
@@ -115,12 +115,28 @@ Although there are two relevant configurations, the typical user should not need
as the default values are applicable to most workloads:
* `spark.memory.fraction` expresses the size of `M` as a fraction of the (JVM heap space - 300MB)
-(default 0.75). The rest of the space (25%) is reserved for user data structures, internal
+(default 0.6). The rest of the space (25%) is reserved for user data structures, internal
metadata in Spark, and safeguarding against OOM errors in the case of sparse and unusually
large records.
* `spark.memory.storageFraction` expresses the size of `R` as a fraction of `M` (default 0.5).
`R` is the storage space within `M` where cached blocks immune to being evicted by execution.
+The value of `spark.memory.fraction` should be set in order to fit this amount of heap space
+comfortably within the JVM's old or "tenured" generation. Otherwise, when much of this space is
+used for caching and execution, the tenured generation will be full, which causes the JVM to
+significantly increase time spent in garbage collection. See
+<a href="https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/sizing.html">Java GC sizing documentation</a>
+for more information.
+
+The tenured generation size is controlled by the JVM's `NewRatio` parameter, which defaults to 2,
+meaning that the tenured generation is 2 times the size of the new generation (the rest of the heap).
+So, by default, the tenured generation occupies 2/3 or about 0.66 of the heap. A value of
+0.6 for `spark.memory.fraction` keeps storage and execution memory within the old generation with
+room to spare. If `spark.memory.fraction` is increased to, say, 0.8, then `NewRatio` may have to
+increase to 6 or more.
+
+`NewRatio` is set as a JVM flag for executors, which means adding
+`spark.executor.extraJavaOptions=-XX:NewRatio=x` to a Spark job's configuration.
## Determining Memory Consumption