[SPARK-18761][CORE] Introduce "task reaper" to oversee task killing in executors

## What changes were proposed in this pull request? Spark's current task cancellation / task killing mechanism is "best effort" because some tasks may not be interruptible or may not respond to their "killed" flags being set. If a significant fraction of a cluster's task slots are occupied by tasks that have been marked as killed but remain running then this can lead to a situation where new jobs and tasks are starved of resources that are being used by these zombie tasks. This patch aims to address this problem by adding a "task reaper" mechanism to executors. At a high-level, task killing now launches a new thread which attempts to kill the task and then watches the task and periodically checks whether it has been killed. The TaskReaper will periodically re-attempt to call `TaskRunner.kill()` and will log warnings if the task keeps running. I modified TaskRunner to rename its thread at the start of the task, allowing TaskReaper to take a thread dump and filter it in order to log stacktraces from the exact task thread that we are waiting to finish. If the task has not stopped after a configurable timeout then the TaskReaper will throw an exception to trigger executor JVM death, thereby forcibly freeing any resources consumed by the zombie tasks. This feature is flagged off by default and is controlled by four new configurations under the `spark.task.reaper.*` namespace. See the updated `configuration.md` doc for details. ## How was this patch tested? Tested via a new test case in `JobCancellationSuite`, plus manual testing. Author: Josh Rosen <joshrosen@databricks.com> Closes #16189 from JoshRosen/cancellation.
author: Josh Rosen <joshrosen@databricks.com> 2016-12-19 18:43:59 -0800
committer: Yin Huai <yhuai@databricks.com> 2016-12-19 18:43:59 -0800
commit: fa829ce21fb84028d90b739a49c4ece70a17ccfd (patch)
tree: 1ac5a7e18d76a3dfb94209c361fcb48f1b13eba0 /docs/configuration.md
parent: 5857b9ac2d9808d9b89a5b29620b5052e2beebf5 (diff)
download: spark-fa829ce21fb84028d90b739a49c4ece70a17ccfd.tar.gz
spark-fa829ce21fb84028d90b739a49c4ece70a17ccfd.tar.bz2
spark-fa829ce21fb84028d90b739a49c4ece70a17ccfd.zip
1 files changed, 42 insertions, 0 deletions
diff --git a/docs/configuration.md b/docs/configuration.md
index a8b7197267..39bfb3a05b 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -1423,6 +1423,48 @@ Apart from these, the following properties are also available, and may be useful
     Should be greater than or equal to 1. Number of allowed retries = this value - 1.
   </td>
 </tr>
+<tr>
+  <td><code>spark.task.reaper.enabled</code></td>
+  <td>false</td>
+  <td>
+    Enables monitoring of killed / interrupted tasks. When set to true, any task which is killed
+    will be monitored by the executor until that task actually finishes executing. See the other
+    <code>spark.task.reaper.*</code> configurations for details on how to control the exact behavior
+    of this monitoring</code>. When set to false (the default), task killing will use an older code
+    path which lacks such monitoring.
+  </td>
+</tr>
+<tr>
+  <td><code>spark.task.reaper.pollingInterval</code></td>
+  <td>10s</td>
+  <td>
+    When <code>spark.task.reaper.enabled = true</code>, this setting controls the frequency at which
+    executors will poll the status of killed tasks. If a killed task is still running when polled
+    then a warning will be logged and, by default, a thread-dump of the task will be logged
+    (this thread dump can be disabled via the <code>spark.task.reaper.threadDump</code> setting,
+    which is documented below).
+  </td>
+</tr>
+<tr>
+  <td><code>spark.task.reaper.threadDump</code></td>
+  <td>true</td>
+  <td>
+    When <code>spark.task.reaper.enabled = true</code>, this setting controls whether task thread
+    dumps are logged during periodic polling of killed tasks. Set this to false to disable
+    collection of thread dumps.
+  </td>
+</tr>
+<tr>
+  <td><code>spark.task.reaper.killTimeout</code></td>
+  <td>-1</td>
+  <td>
+    When <code>spark.task.reaper.enabled = true</code>, this setting specifies a timeout after
+    which the executor JVM will kill itself if a killed task has not stopped running. The default
+    value, -1, disables this mechanism and prevents the executor from self-destructing. The purpose
+    of this setting is to act as a safety-net to prevent runaway uncancellable tasks from rendering
+    an executor unusable.
+  </td>
+</tr>
 </table>
 
 #### Dynamic Allocation
author	Josh Rosen <joshrosen@databricks.com>	2016-12-19 18:43:59 -0800
committer	Yin Huai <yhuai@databricks.com>	2016-12-19 18:43:59 -0800
commit	fa829ce21fb84028d90b739a49c4ece70a17ccfd (patch)
tree	1ac5a7e18d76a3dfb94209c361fcb48f1b13eba0 /docs/configuration.md
parent	5857b9ac2d9808d9b89a5b29620b5052e2beebf5 (diff)
download	spark-fa829ce21fb84028d90b739a49c4ece70a17ccfd.tar.gz spark-fa829ce21fb84028d90b739a49c4ece70a17ccfd.tar.bz2 spark-fa829ce21fb84028d90b739a49c4ece70a17ccfd.zip