[SPARK-7698] Cache and reuse buffers in ExecutorMemoryAllocator when using heap allocation - spark

diff options

author	Josh Rosen <joshrosen@databricks.com>	2015-05-20 16:37:11 -0700
committer	Josh Rosen <joshrosen@databricks.com>	2015-05-20 16:37:11 -0700
commit	7956dd7ab03e1542d89dd94c043f1e5131684199 (patch)
tree	a753324eb6f10972f914ad5fbab29d97b88c8e26 /python
parent	3c434cbfd0d6821e5bcf572be792b787a514018b (diff)
download	spark-7956dd7ab03e1542d89dd94c043f1e5131684199.tar.gz spark-7956dd7ab03e1542d89dd94c043f1e5131684199.tar.bz2 spark-7956dd7ab03e1542d89dd94c043f1e5131684199.zip

[SPARK-7698] Cache and reuse buffers in ExecutorMemoryAllocator when using heap allocation

When on-heap memory allocation is used, ExecutorMemoryManager should maintain a cache / pool of buffers for re-use by tasks. This will significantly improve the performance of the new Tungsten's sort-shuffle for jobs with many short-lived tasks by eliminating a major source of GC. This pull request is a minimum-viable-implementation of this idea. In its current form, this patch significantly improves performance on a stress test which launches huge numbers of short-lived shuffle map tasks back-to-back in the same JVM. Author: Josh Rosen <joshrosen@databricks.com> Closes #6227 from JoshRosen/SPARK-7698 and squashes the following commits: fd6cb55 [Josh Rosen] SoftReference -> WeakReference b154e86 [Josh Rosen] WIP sketch of pooling in ExecutorMemoryManager

Diffstat (limited to 'python')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: