aboutsummaryrefslogtreecommitdiff
path: root/sbin
diff options
context:
space:
mode:
authorAndrew Or <andrew@databricks.com>2015-12-10 15:30:08 -0800
committerAndrew Or <andrew@databricks.com>2015-12-10 15:30:08 -0800
commit5030923ea8bb94ac8fa8e432de9fc7089aa93986 (patch)
treec8862be47d526f2791d3e042fcffcd871093aebb /sbin
parent23a9e62bad9669e9ff5dc4bd714f58d12f9be0b5 (diff)
downloadspark-5030923ea8bb94ac8fa8e432de9fc7089aa93986.tar.gz
spark-5030923ea8bb94ac8fa8e432de9fc7089aa93986.tar.bz2
spark-5030923ea8bb94ac8fa8e432de9fc7089aa93986.zip
[SPARK-12155][SPARK-12253] Fix executor OOM in unified memory management
**Problem.** In unified memory management, acquiring execution memory may lead to eviction of storage memory. However, the space freed from evicting cached blocks is distributed among all active tasks. Thus, an incorrect upper bound on the execution memory per task can cause the acquisition to fail, leading to OOM's and premature spills. **Example.** Suppose total memory is 1000B, cached blocks occupy 900B, `spark.memory.storageFraction` is 0.4, and there are two active tasks. In this case, the cap on task execution memory is 100B / 2 = 50B. If task A tries to acquire 200B, it will evict 100B of storage but can only acquire 50B because of the incorrect cap. For another example, see this [regression test](https://github.com/andrewor14/spark/blob/fix-oom/core/src/test/scala/org/apache/spark/memory/UnifiedMemoryManagerSuite.scala#L233) that I stole from JoshRosen. **Solution.** Fix the cap on task execution memory. It should take into account the space that could have been freed by storage in addition to the current amount of memory available to execution. In the example above, the correct cap should have been 600B / 2 = 300B. This patch also guards against the race condition (SPARK-12253): (1) Existing tasks collectively occupy all execution memory (2) New task comes in and blocks while existing tasks spill (3) After tasks finish spilling, another task jumps in and puts in a large block, stealing the freed memory (4) New task still cannot acquire memory and goes back to sleep Author: Andrew Or <andrew@databricks.com> Closes #10240 from andrewor14/fix-oom.
Diffstat (limited to 'sbin')
0 files changed, 0 insertions, 0 deletions