From b3ffac5178795f2d8e7908b3e77e8e89f50b5f6f Mon Sep 17 00:00:00 2001 From: Andrew Or Date: Tue, 13 Oct 2015 13:49:59 -0700 Subject: [SPARK-10983] Unified memory manager MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This patch unifies the memory management of the storage and execution regions such that either side can borrow memory from each other. When memory pressure arises, storage will be evicted in favor of execution. To avoid regressions in cases where storage is crucial, we dynamically allocate a fraction of space for storage that execution cannot evict. Several configurations are introduced: - **spark.memory.fraction (default 0.75)**: ​fraction of the heap space used for execution and storage. The lower this is, the more frequently spills and cached data eviction occur. The purpose of this config is to set aside memory for internal metadata, user data structures, and imprecise size estimation in the case of sparse, unusually large records. - **spark.memory.storageFraction (default 0.5)**: size of the storage region within the space set aside by `s​park.memory.fraction`. ​Cached data may only be evicted if total storage exceeds this region. - **spark.memory.useLegacyMode (default false)**: whether to use the memory management that existed in Spark 1.5 and before. This is mainly for backward compatibility. For a detailed description of the design, see [SPARK-10000](https://issues.apache.org/jira/browse/SPARK-10000). This patch builds on top of the `MemoryManager` interface introduced in #9000. Author: Andrew Or Closes #9084 from andrewor14/unified-memory-manager. --- docs/configuration.md | 99 ++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 70 insertions(+), 29 deletions(-) (limited to 'docs/configuration.md') diff --git a/docs/configuration.md b/docs/configuration.md index 154a3aee68..771d93be04 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -445,17 +445,6 @@ Apart from these, the following properties are also available, and may be useful met. - - spark.shuffle.memoryFraction - 0.2 - - Fraction of Java heap to use for aggregation and cogroups during shuffles. - At any given time, the collective size of - all in-memory maps used for shuffles is bounded by this limit, beyond which the contents will - begin to spill to disk. If spills are often, consider increasing this value at the expense of - spark.storage.memoryFraction. - - spark.shuffle.service.enabled false @@ -712,6 +701,76 @@ Apart from these, the following properties are also available, and may be useful +#### Memory Management + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Property NameDefaultMeaning
spark.memory.fraction0.75 + Fraction of the heap space used for execution and storage. The lower this is, the more + frequently spills and cached data eviction occur. The purpose of this config is to set + aside memory for internal metadata, user data structures, and imprecise size estimation + in the case of sparse, unusually large records. +
spark.memory.storageFraction0.5 + T​he size of the storage region within the space set aside by + s​park.memory.fraction. This region is not statically reserved, but dynamically + allocated as cache requests come in. ​Cached data may be evicted only if total storage exceeds + this region. +
spark.memory.useLegacyModefalse + ​Whether to enable the legacy memory management mode used in Spark 1.5 and before. + The legacy mode rigidly partitions the heap space into fixed-size regions, + potentially leading to excessive spilling if the application was not tuned. + The following deprecated memory fraction configurations are not read unless this is enabled: + spark.shuffle.memoryFraction
+ spark.storage.memoryFraction
+ spark.storage.unrollFraction +
spark.shuffle.memoryFraction0.2 + (deprecated) This is read only if spark.memory.useLegacyMode is enabled. + Fraction of Java heap to use for aggregation and cogroups during shuffles. + At any given time, the collective size of + all in-memory maps used for shuffles is bounded by this limit, beyond which the contents will + begin to spill to disk. If spills are often, consider increasing this value at the expense of + spark.storage.memoryFraction. +
spark.storage.memoryFraction0.6 + (deprecated) This is read only if spark.memory.useLegacyMode is enabled. + Fraction of Java heap to use for Spark's memory cache. This should not be larger than the "old" + generation of objects in the JVM, which by default is given 0.6 of the heap, but you can + increase it if you configure your own old generation size. +
spark.storage.unrollFraction0.2 + (deprecated) This is read only if spark.memory.useLegacyMode is enabled. + Fraction of spark.storage.memoryFraction to use for unrolling blocks in memory. + This is dynamically allocated by dropping existing blocks when there is not enough free + storage space to unroll the new block in its entirety. +
+ #### Execution Behavior @@ -824,15 +883,6 @@ Apart from these, the following properties are also available, and may be useful This setting is ignored for jobs generated through Spark Streaming's StreamingContext, since data may need to be rewritten to pre-existing output directories during checkpoint recovery. - - - - - @@ -842,15 +892,6 @@ Apart from these, the following properties are also available, and may be useful mapping has high overhead for blocks close to or below the page size of the operating system. - - - - - -- cgit v1.2.3
Property NameDefaultMeaning
spark.storage.memoryFraction0.6 - Fraction of Java heap to use for Spark's memory cache. This should not be larger than the "old" - generation of objects in the JVM, which by default is given 0.6 of the heap, but you can - increase it if you configure your own old generation size. -
spark.storage.memoryMapThreshold 2m
spark.storage.unrollFraction0.2 - Fraction of spark.storage.memoryFraction to use for unrolling blocks in memory. - This is dynamically allocated by dropping existing blocks when there is not enough free - storage space to unroll the new block in its entirety. -
spark.externalBlockStore.blockManager org.apache.spark.storage.TachyonBlockManager