aboutsummaryrefslogtreecommitdiff
path: root/.rat-excludes
diff options
context:
space:
mode:
authormcheah <mcheah@palantir.com>2015-09-10 11:58:54 -0700
committerAndrew Or <andrew@databricks.com>2015-09-10 11:58:54 -0700
commitaf3bc59d1f5d9d952c2d7ad1af599c49f1dbdaf0 (patch)
tree979a7e64505f9ccdabf98148b8f8e9e745448e65 /.rat-excludes
parenta76bde9dae54c4641e21f3c1ceb4870e3dc91881 (diff)
downloadspark-af3bc59d1f5d9d952c2d7ad1af599c49f1dbdaf0.tar.gz
spark-af3bc59d1f5d9d952c2d7ad1af599c49f1dbdaf0.tar.bz2
spark-af3bc59d1f5d9d952c2d7ad1af599c49f1dbdaf0.zip
[SPARK-8167] Make tasks that fail from YARN preemption not fail job
The architecture is that, in YARN mode, if the driver detects that an executor has disconnected, it asks the ApplicationMaster why the executor died. If the ApplicationMaster is aware that the executor died because of preemption, all tasks associated with that executor are not marked as failed. The executor is still removed from the driver's list of available executors, however. There's a few open questions: 1. Should standalone mode have a similar "get executor loss reason" as well? I localized this change as much as possible to affect only YARN, but there could be a valid case to differentiate executor losses in standalone mode as well. 2. I make a pretty strong assumption in YarnAllocator that getExecutorLossReason(executorId) will only be called once per executor id; I do this so that I can remove the metadata from the in-memory map to avoid object accumulation. It's not clear if I'm being overly zealous to save space, however. cc vanzin specifically for review because it collided with some earlier YARN scheduling work. cc JoshRosen because it's similar to output commit coordination we did in the past cc andrewor14 for our discussion on how to get executor exit codes and loss reasons Author: mcheah <mcheah@palantir.com> Closes #8007 from mccheah/feature/preemption-handling.
Diffstat (limited to '.rat-excludes')
0 files changed, 0 insertions, 0 deletions