diff options
author | Andrew Or <andrew@databricks.com> | 2015-08-12 09:24:50 -0700 |
---|---|---|
committer | Andrew Or <andrew@databricks.com> | 2015-08-12 09:24:50 -0700 |
commit | be5d1912076c2ffd21ec88611e53d3b3c59b7ecc (patch) | |
tree | c1e540052ff82c04c275cbd5ea4227162ee17671 /extras/kinesis-asl | |
parent | 2e680668f7b6fc158aa068aedd19c1878ecf759e (diff) | |
download | spark-be5d1912076c2ffd21ec88611e53d3b3c59b7ecc.tar.gz spark-be5d1912076c2ffd21ec88611e53d3b3c59b7ecc.tar.bz2 spark-be5d1912076c2ffd21ec88611e53d3b3c59b7ecc.zip |
[SPARK-9795] Dynamic allocation: avoid double counting when killing same executor twice
This is based on KaiXinXiaoLei's changes in #7716.
The issue is that when someone calls `sc.killExecutor("1")` on the same executor twice quickly, then the executor target will be adjusted downwards by 2 instead of 1 even though we're only actually killing one executor. In certain cases where we don't adjust the target back upwards quickly, we'll end up with jobs hanging.
This is a common danger because there are many places where this is called:
- `HeartbeatReceiver` kills an executor that has not been sending heartbeats
- `ExecutorAllocationManager` kills an executor that has been idle
- The user code might call this, which may interfere with the previous callers
While it's not clear whether this fixes SPARK-9745, fixing this potential race condition seems like a strict improvement. I've added a regression test to illustrate the issue.
Author: Andrew Or <andrew@databricks.com>
Closes #8078 from andrewor14/da-double-kill.
Diffstat (limited to 'extras/kinesis-asl')
0 files changed, 0 insertions, 0 deletions