[SPARK-18553][CORE] Fix leak of TaskSetManager following executor loss - spark

diff options

author	Josh Rosen <joshrosen@databricks.com>	2016-11-29 16:27:25 -0800
committer	Josh Rosen <joshrosen@databricks.com>	2016-11-29 16:27:25 -0800
commit	9a02f6821265ff67ba3f7b095cd1afaebd25a898 (patch)
tree	f5e74a63361b88bd4f91844347978a4e1c24ca00 /external
parent	3600635215f25d695c9be5931b5185fec8a35527 (diff)
download	spark-9a02f6821265ff67ba3f7b095cd1afaebd25a898.tar.gz spark-9a02f6821265ff67ba3f7b095cd1afaebd25a898.tar.bz2 spark-9a02f6821265ff67ba3f7b095cd1afaebd25a898.zip

[SPARK-18553][CORE] Fix leak of TaskSetManager following executor loss

## What changes were proposed in this pull request? _This is the master branch version of #15986; the original description follows:_ This patch fixes a critical resource leak in the TaskScheduler which could cause RDDs and ShuffleDependencies to be kept alive indefinitely if an executor with running tasks is permanently lost and the associated stage fails. This problem was originally identified by analyzing the heap dump of a driver belonging to a cluster that had run out of shuffle space. This dump contained several `ShuffleDependency` instances that were retained by `TaskSetManager`s inside the scheduler but were not otherwise referenced. Each of these `TaskSetManager`s was considered a "zombie" but had no running tasks and therefore should have been cleaned up. However, these zombie task sets were still referenced by the `TaskSchedulerImpl.taskIdToTaskSetManager` map. Entries are added to the `taskIdToTaskSetManager` map when tasks are launched and are removed inside of `TaskScheduler.statusUpdate()`, which is invoked by the scheduler backend while processing `StatusUpdate` messages from executors. The problem with this design is that a completely dead executor will never send a `StatusUpdate`. There is [some code](https://github.com/apache/spark/blob/072f4c518cdc57d705beec6bcc3113d9a6740819/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L338) in `statusUpdate` which handles tasks that exit with the `TaskState.LOST` state (which is supposed to correspond to a task failure triggered by total executor loss), but this state only seems to be used in Mesos fine-grained mode. There doesn't seem to be any code which performs per-task state cleanup for tasks that were running on an executor that completely disappears without sending any sort of final death message. The `executorLost` and [`removeExecutor`](https://github.com/apache/spark/blob/072f4c518cdc57d705beec6bcc3113d9a6740819/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L527) methods don't appear to perform any cleanup of the `taskId -> *` mappings, causing the leaks observed here. This patch's fix is to maintain a `executorId -> running task id` mapping so that these `taskId -> *` maps can be properly cleaned up following an executor loss. There are some potential corner-case interactions that I'm concerned about here, especially some details in [the comment](https://github.com/apache/spark/blob/072f4c518cdc57d705beec6bcc3113d9a6740819/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L523) in `removeExecutor`, so I'd appreciate a very careful review of these changes. ## How was this patch tested? I added a new unit test to `TaskSchedulerImplSuite`. /cc kayousterhout and markhamstra, who reviewed #15986. Author: Josh Rosen <joshrosen@databricks.com> Closes #16045 from JoshRosen/fix-leak-following-total-executor-loss-master.

Diffstat (limited to 'external')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: