[SPARK-8167] Make tasks that fail from YARN preemption not fail job - spark

diff options

author	mcheah <mcheah@palantir.com>	2015-09-10 11:58:54 -0700
committer	Andrew Or <andrew@databricks.com>	2015-09-10 11:58:54 -0700
commit	af3bc59d1f5d9d952c2d7ad1af599c49f1dbdaf0 (patch)
tree	979a7e64505f9ccdabf98148b8f8e9e745448e65 /.rat-excludes
parent	a76bde9dae54c4641e21f3c1ceb4870e3dc91881 (diff)
download	spark-af3bc59d1f5d9d952c2d7ad1af599c49f1dbdaf0.tar.gz spark-af3bc59d1f5d9d952c2d7ad1af599c49f1dbdaf0.tar.bz2 spark-af3bc59d1f5d9d952c2d7ad1af599c49f1dbdaf0.zip

[SPARK-8167] Make tasks that fail from YARN preemption not fail job

The architecture is that, in YARN mode, if the driver detects that an executor has disconnected, it asks the ApplicationMaster why the executor died. If the ApplicationMaster is aware that the executor died because of preemption, all tasks associated with that executor are not marked as failed. The executor is still removed from the driver's list of available executors, however. There's a few open questions: 1. Should standalone mode have a similar "get executor loss reason" as well? I localized this change as much as possible to affect only YARN, but there could be a valid case to differentiate executor losses in standalone mode as well. 2. I make a pretty strong assumption in YarnAllocator that getExecutorLossReason(executorId) will only be called once per executor id; I do this so that I can remove the metadata from the in-memory map to avoid object accumulation. It's not clear if I'm being overly zealous to save space, however. cc vanzin specifically for review because it collided with some earlier YARN scheduling work. cc JoshRosen because it's similar to output commit coordination we did in the past cc andrewor14 for our discussion on how to get executor exit codes and loss reasons Author: mcheah <mcheah@palantir.com> Closes #8007 from mccheah/feature/preemption-handling.

Diffstat (limited to '.rat-excludes')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: