[SPARK-4495] Fix memory leak in JobProgressListener - spark

diff options

author	Josh Rosen <joshrosen@databricks.com>	2014-11-19 16:50:21 -0800
committer	Josh Rosen <joshrosen@databricks.com>	2014-11-19 16:50:44 -0800
commit	a7c64cc8f939b6c777e296f775d68fb7088a7530 (patch)
tree	e760e66bbe5ad2fe04b1ef81b6e4629fc9be5dbd /project
parent	a250ca369208b23503d7fff1cf9ee52e2e1ba3e2 (diff)
download	spark-a7c64cc8f939b6c777e296f775d68fb7088a7530.tar.gz spark-a7c64cc8f939b6c777e296f775d68fb7088a7530.tar.bz2 spark-a7c64cc8f939b6c777e296f775d68fb7088a7530.zip

[SPARK-4495] Fix memory leak in JobProgressListener

This commit fixes a memory leak in JobProgressListener that I introduced in SPARK-2321 and adds a testing framework to ensure that it’s very difficult to inadvertently introduce new memory leaks. This solution might be overkill, but the main idea is to partition JobProgressListener's state into three buckets: collections that should be empty once Spark is idle, collections that must obey some hard size limit, and collections that have a soft size limit (they can grow arbitrarily large when Spark is active but must shrink to fit within some bound after Spark becomes idle). Based on this, we can write fairly generic tests that run workloads that submit more than `spark.ui.retainedStages` stages and `spark.ui.retainedJobs` jobs then check that these various collections' sizes obey their contracts. Author: Josh Rosen <joshrosen@databricks.com> Closes #3372 from JoshRosen/SPARK-4495 and squashes the following commits: c73fab5 [Josh Rosen] "data structures" -> collections be72e81 [Josh Rosen] [SPARK-4495] Fix memory leaks in JobProgressListener (cherry picked from commit 04d462f648aba7b18fc293b7189b86af70e421bc) Signed-off-by: Josh Rosen <joshrosen@databricks.com>

Diffstat (limited to 'project')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: