aboutsummaryrefslogtreecommitdiff
path: root/repl/scala-2.11
diff options
context:
space:
mode:
authorJosé Hiram Soltren <jose@cloudera.com>2017-02-09 12:49:31 -0600
committerImran Rashid <irashid@cloudera.com>2017-02-09 12:49:31 -0600
commit6287c94f08200d548df5cc0a401b73b84f9968c4 (patch)
treebd1e3eaf116c39d85584a203fbc84802b794e010 /repl/scala-2.11
parentaf63c52fd36c59525d9504003b15142dc850fccb (diff)
downloadspark-6287c94f08200d548df5cc0a401b73b84f9968c4.tar.gz
spark-6287c94f08200d548df5cc0a401b73b84f9968c4.tar.bz2
spark-6287c94f08200d548df5cc0a401b73b84f9968c4.zip
[SPARK-16554][CORE] Automatically Kill Executors and Nodes when they are Blacklisted
## What changes were proposed in this pull request? In SPARK-8425, we introduced a mechanism for blacklisting executors and nodes (hosts). After a certain number of failures, these resources would be "blacklisted" and no further work would be assigned to them for some period of time. In some scenarios, it is better to fail fast, and to simply kill these unreliable resources. This changes proposes to do so by having the BlacklistTracker kill unreliable resources when they would otherwise be "blacklisted". In order to be thread safe, this code depends on the CoarseGrainedSchedulerBackend sending a message to the driver backend in order to do the actual killing. This also helps to prevent a race which would permit work to begin on a resource (executor or node), between the time the resource is marked for killing and the time at which it is finally killed. ## How was this patch tested? ./dev/run-tests Ran https://github.com/jsoltren/jose-utils/blob/master/blacklist/test-blacklist.sh, and checked logs to see executors and nodes being killed. Testing can likely be improved here; suggestions welcome. Author: José Hiram Soltren <jose@cloudera.com> Closes #16650 from jsoltren/SPARK-16554-submit.
Diffstat (limited to 'repl/scala-2.11')
0 files changed, 0 insertions, 0 deletions