[SPARK-4495] Fix memory leak in JobProgressListener - spark

diff options

author	Josh Rosen <joshrosen@databricks.com>	2014-11-19 16:50:21 -0800
committer	Josh Rosen <joshrosen@databricks.com>	2014-11-19 16:50:21 -0800
commit	04d462f648aba7b18fc293b7189b86af70e421bc (patch)
tree	d5816c007919740531c942b23a5f99e23cc7c3a6 /docs/graphx-programming-guide.md
parent	c3002c4a61c4fc5b966aa384c41c3cba33de0aa6 (diff)
download	spark-04d462f648aba7b18fc293b7189b86af70e421bc.tar.gz spark-04d462f648aba7b18fc293b7189b86af70e421bc.tar.bz2 spark-04d462f648aba7b18fc293b7189b86af70e421bc.zip

[SPARK-4495] Fix memory leak in JobProgressListener

This commit fixes a memory leak in JobProgressListener that I introduced in SPARK-2321 and adds a testing framework to ensure that it’s very difficult to inadvertently introduce new memory leaks. This solution might be overkill, but the main idea is to partition JobProgressListener's state into three buckets: collections that should be empty once Spark is idle, collections that must obey some hard size limit, and collections that have a soft size limit (they can grow arbitrarily large when Spark is active but must shrink to fit within some bound after Spark becomes idle). Based on this, we can write fairly generic tests that run workloads that submit more than `spark.ui.retainedStages` stages and `spark.ui.retainedJobs` jobs then check that these various collections' sizes obey their contracts. Author: Josh Rosen <joshrosen@databricks.com> Closes #3372 from JoshRosen/SPARK-4495 and squashes the following commits: c73fab5 [Josh Rosen] "data structures" -> collections be72e81 [Josh Rosen] [SPARK-4495] Fix memory leaks in JobProgressListener

Diffstat (limited to 'docs/graphx-programming-guide.md')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: