aboutsummaryrefslogtreecommitdiff
path: root/python/pyspark/mllib/linalg/distributed.py
diff options
context:
space:
mode:
authorKay Ousterhout <kayousterhout@gmail.com>2015-10-27 10:46:43 -0700
committerKay Ousterhout <kayousterhout@gmail.com>2015-10-27 10:46:43 -0700
commit9fc16a82adb5f3db2a250765c11393794404a51b (patch)
treee9d6f8fc4204aa697fc954f2713a6ec5e58afe1f /python/pyspark/mllib/linalg/distributed.py
parent360ed832f5213b805ac28cf1d2828be09480f2d6 (diff)
downloadspark-9fc16a82adb5f3db2a250765c11393794404a51b.tar.gz
spark-9fc16a82adb5f3db2a250765c11393794404a51b.tar.bz2
spark-9fc16a82adb5f3db2a250765c11393794404a51b.zip
[SPARK-11306] Fix hang when JVM exits.
This commit fixes a bug where, in Standalone mode, if a task fails and crashes the JVM, the failure is considered a "normal failure" (meaning it's considered unrelated to the task), so the failure isn't counted against the task's maximum number of failures: https://github.com/apache/spark/commit/af3bc59d1f5d9d952c2d7ad1af599c49f1dbdaf0#diff-a755f3d892ff2506a7aa7db52022d77cL138. As a result, if a task fails in a way that results in it crashing the JVM, it will continuously be re-launched, resulting in a hang. This commit fixes that problem. This bug was introduced by #8007; andrewor14 mccheah vanzin can you take a look at this? This error is hard to trigger because we handle executor losses through 2 code paths (the second is via Akka, where Akka notices that the executor endpoint is disconnected). In my setup, the Akka code path completes first, and doesn't have this bug, so things work fine (see my recent email to the dev list about this). If I manually disable the Akka code path, I can see the hang (and this commit fixes the issue). Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #9273 from kayousterhout/SPARK-11306.
Diffstat (limited to 'python/pyspark/mllib/linalg/distributed.py')
0 files changed, 0 insertions, 0 deletions