aboutsummaryrefslogtreecommitdiff
path: root/mllib/src
diff options
context:
space:
mode:
authorJosh Rosen <joshrosen@databricks.com>2015-04-07 16:18:55 -0700
committerJosh Rosen <joshrosen@databricks.com>2015-04-07 16:18:55 -0700
commitc83e03948b184ffb3a9418fecc4d2c26ae33b057 (patch)
tree609341e9565d783ca8871b8041a578b9cab7b760 /mllib/src
parent77bcceb9f01e97cb6f41791f2167b40c4311f701 (diff)
downloadspark-c83e03948b184ffb3a9418fecc4d2c26ae33b057.tar.gz
spark-c83e03948b184ffb3a9418fecc4d2c26ae33b057.tar.bz2
spark-c83e03948b184ffb3a9418fecc4d2c26ae33b057.zip
[SPARK-6737] Fix memory leak in OutputCommitCoordinator
This patch fixes a memory leak in the DAGScheduler, which caused us to leak a map entry per submitted stage. The problem is that the OutputCommitCoordinator needs to be informed when stages end in order to remove entries from its `authorizedCommitters` map, but the DAGScheduler only called it in one of the four code paths that are used to mark stages as completed. This patch fixes this issue by consolidating the processing of stage completion into a new `markStageAsFinished` method and updates DAGSchedulerSuite's `assertDataStructuresEmpty` assertion to also check the OutputCommitCoordinator data structures. I've also added a comment at the top of DAGScheduler so that we remember to update this test when adding new data structures. Author: Josh Rosen <joshrosen@databricks.com> Closes #5397 from JoshRosen/SPARK-6737 and squashes the following commits: af3b02f [Josh Rosen] Consolidate stage completion handling code in a single method. e96ce3a [Josh Rosen] Consolidate stage completion handling code in a single method. 3052aea [Josh Rosen] Comment update 7896899 [Josh Rosen] Fix SPARK-6737 by informing OutputCommitCoordinator of all stage end events. 4ead1dc [Josh Rosen] Add regression tests for SPARK-6737
Diffstat (limited to 'mllib/src')
0 files changed, 0 insertions, 0 deletions