aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorSean Owen <sowen@cloudera.com>2016-06-16 23:04:10 +0200
committerSean Owen <sowen@cloudera.com>2016-06-16 23:04:10 +0200
commit457126e420e66228cc68def4bc3d87e7a282069a (patch)
tree70a86161312c672cd7d85cedcd3aaa281d49c1f7 /docs
parent36110a8306608186696c536028d2776e022d305a (diff)
downloadspark-457126e420e66228cc68def4bc3d87e7a282069a.tar.gz
spark-457126e420e66228cc68def4bc3d87e7a282069a.tar.bz2
spark-457126e420e66228cc68def4bc3d87e7a282069a.zip
[SPARK-15796][CORE] Reduce spark.memory.fraction default to avoid overrunning old gen in JVM default config
## What changes were proposed in this pull request? Reduce `spark.memory.fraction` default to 0.6 in order to make it fit within default JVM old generation size (2/3 heap). See JIRA discussion. This means a full cache doesn't spill into the new gen. CC andrewor14 ## How was this patch tested? Jenkins tests. Author: Sean Owen <sowen@cloudera.com> Closes #13618 from srowen/SPARK-15796.
Diffstat (limited to 'docs')
-rw-r--r--docs/configuration.md7
-rw-r--r--docs/tuning.md18
2 files changed, 21 insertions, 4 deletions
diff --git a/docs/configuration.md b/docs/configuration.md
index 32c3a92660..fbda91c109 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -778,14 +778,15 @@ Apart from these, the following properties are also available, and may be useful
<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
<tr>
<td><code>spark.memory.fraction</code></td>
- <td>0.75</td>
+ <td>0.6</td>
<td>
Fraction of (heap space - 300MB) used for execution and storage. The lower this is, the
more frequently spills and cached data eviction occur. The purpose of this config is to set
aside memory for internal metadata, user data structures, and imprecise size estimation
in the case of sparse, unusually large records. Leaving this at the default value is
- recommended. For more detail, see <a href="tuning.html#memory-management-overview">
- this description</a>.
+ recommended. For more detail, including important information about correctly tuning JVM
+ garbage collection when increasing this value, see
+ <a href="tuning.html#memory-management-overview">this description</a>.
</td>
</tr>
<tr>
diff --git a/docs/tuning.md b/docs/tuning.md
index e73ed69ffb..1ed14091c0 100644
--- a/docs/tuning.md
+++ b/docs/tuning.md
@@ -115,12 +115,28 @@ Although there are two relevant configurations, the typical user should not need
as the default values are applicable to most workloads:
* `spark.memory.fraction` expresses the size of `M` as a fraction of the (JVM heap space - 300MB)
-(default 0.75). The rest of the space (25%) is reserved for user data structures, internal
+(default 0.6). The rest of the space (25%) is reserved for user data structures, internal
metadata in Spark, and safeguarding against OOM errors in the case of sparse and unusually
large records.
* `spark.memory.storageFraction` expresses the size of `R` as a fraction of `M` (default 0.5).
`R` is the storage space within `M` where cached blocks immune to being evicted by execution.
+The value of `spark.memory.fraction` should be set in order to fit this amount of heap space
+comfortably within the JVM's old or "tenured" generation. Otherwise, when much of this space is
+used for caching and execution, the tenured generation will be full, which causes the JVM to
+significantly increase time spent in garbage collection. See
+<a href="https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/sizing.html">Java GC sizing documentation</a>
+for more information.
+
+The tenured generation size is controlled by the JVM's `NewRatio` parameter, which defaults to 2,
+meaning that the tenured generation is 2 times the size of the new generation (the rest of the heap).
+So, by default, the tenured generation occupies 2/3 or about 0.66 of the heap. A value of
+0.6 for `spark.memory.fraction` keeps storage and execution memory within the old generation with
+room to spare. If `spark.memory.fraction` is increased to, say, 0.8, then `NewRatio` may have to
+increase to 6 or more.
+
+`NewRatio` is set as a JVM flag for executors, which means adding
+`spark.executor.extraJavaOptions=-XX:NewRatio=x` to a Spark job's configuration.
## Determining Memory Consumption