diff options
author | Patrick Woody <pwoody@palantir.com> | 2017-03-02 15:55:32 -0800 |
---|---|---|
committer | Kay Ousterhout <kayousterhout@gmail.com> | 2017-03-02 15:55:32 -0800 |
commit | 433d9eb6151a547af967cc1ac983a789bed60704 (patch) | |
tree | 1cdd8a5481d5394aaa3e935e20c0eaf3785e75d0 /mllib | |
parent | 5ae3516bfb7716f1793eb76b4fdc720b31829d07 (diff) | |
download | spark-433d9eb6151a547af967cc1ac983a789bed60704.tar.gz spark-433d9eb6151a547af967cc1ac983a789bed60704.tar.bz2 spark-433d9eb6151a547af967cc1ac983a789bed60704.zip |
[SPARK-19631][CORE] OutputCommitCoordinator should not allow commits for already failed tasks
## What changes were proposed in this pull request?
Previously it was possible for there to be a race between a task failure and committing the output of a task. For example, the driver may mark a task attempt as failed due to an executor heartbeat timeout (possibly due to GC), but the task attempt actually ends up coordinating with the OutputCommitCoordinator once the executor recovers and committing its result. This will lead to any retry attempt failing because the task result has already been committed despite the original attempt failing.
This ensures that any previously failed task attempts cannot enter the commit protocol.
## How was this patch tested?
Added a unit test
Author: Patrick Woody <pwoody@palantir.com>
Closes #16959 from pwoody/pw/recordFailuresForCommitter.
Diffstat (limited to 'mllib')
0 files changed, 0 insertions, 0 deletions