diff options
author | Andrew Or <andrewor14@gmail.com> | 2014-05-07 14:35:22 -0700 |
---|---|---|
committer | Aaron Davidson <aaron@databricks.com> | 2014-05-07 14:35:37 -0700 |
commit | 82c8e89c9581c45c7878b8f406cf3d90d4b0d74c (patch) | |
tree | 09f45d6e9b347420e64c7ef2ef385851c68a54e8 /yarn | |
parent | 0759ee790527f61bf9f4bcef4aa0befa1d430370 (diff) | |
download | spark-82c8e89c9581c45c7878b8f406cf3d90d4b0d74c.tar.gz spark-82c8e89c9581c45c7878b8f406cf3d90d4b0d74c.tar.bz2 spark-82c8e89c9581c45c7878b8f406cf3d90d4b0d74c.zip |
[SPARK-1688] Propagate PySpark worker stderr to driver
When at least one of the following conditions is true, PySpark cannot be loaded:
1. PYTHONPATH is not set
2. PYTHONPATH does not contain the python directory (or jar, in the case of YARN)
3. The jar does not contain pyspark files (YARN)
4. The jar does not contain py4j files (YARN)
However, we currently throw the same random `java.io.EOFException` for all of the above cases, when trying to read from the python daemon's output. This message is super unhelpful.
This PR includes the python stderr and the PYTHONPATH in the exception propagated to the driver. Now, the exception message looks something like:
```
Error from python worker:
: No module named pyspark
PYTHONPATH was:
/path/to/spark/python:/path/to/some/jar
java.io.EOFException
<stack trace>
```
whereas before it was just
```
java.io.EOFException
<stack trace>
```
Author: Andrew Or <andrewor14@gmail.com>
Closes #603 from andrewor14/pyspark-exception and squashes the following commits:
10d65d3 [Andrew Or] Throwable -> Exception, worker -> daemon
862d1d7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-exception
a5ed798 [Andrew Or] Use block string and interpolation instead of var (minor)
cc09c45 [Andrew Or] Account for the fact that the python daemon may not have terminated yet
444f019 [Andrew Or] Use the new RedirectThread + include system PYTHONPATH
aab00ae [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-exception
0cc2402 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-exception
783efe2 [Andrew Or] Make python daemon stderr indentation consistent
9524172 [Andrew Or] Avoid potential NPE / error stream contention + Move things around
29f9688 [Andrew Or] Add back original exception type
e92d36b [Andrew Or] Include python worker stderr in the exception propagated to the driver
7c69360 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-exception
cdbc185 [Andrew Or] Fix python attribute not found exception when PYTHONPATH is not set
dcc0353 [Andrew Or] Check both python and system environment variables for PYTHONPATH
6c09c21 [Andrew Or] Validate PYTHONPATH and PySpark modules before starting python workers
(cherry picked from commit 5200872243aa5906dc8a06772e61d75f19557aac)
Signed-off-by: Aaron Davidson <aaron@databricks.com>
Diffstat (limited to 'yarn')
0 files changed, 0 insertions, 0 deletions