diff options
author | Andrew Or <andrewor14@gmail.com> | 2014-08-27 23:03:46 -0700 |
---|---|---|
committer | Patrick Wendell <pwendell@gmail.com> | 2014-08-27 23:03:46 -0700 |
commit | dafe343499bbc688e266106e4bb897f9e619834e (patch) | |
tree | 346f636c4305ea503e5214d1abf7436a1fe271fa /bin/pyspark | |
parent | f38fab97c7970168f1bd81d4dc202e36322c95e3 (diff) | |
download | spark-dafe343499bbc688e266106e4bb897f9e619834e.tar.gz spark-dafe343499bbc688e266106e4bb897f9e619834e.tar.bz2 spark-dafe343499bbc688e266106e4bb897f9e619834e.zip |
[HOTFIX] Wait for EOF only for the PySpark shell
In `SparkSubmitDriverBootstrapper`, we wait for the parent process to send us an `EOF` before finishing the application. This is applicable for the PySpark shell because we terminate the application the same way. However if we run a python application, for instance, the JVM actually never exits unless it receives a manual EOF from the user. This is causing a few tests to timeout.
We only need to do this for the PySpark shell because Spark submit runs as a python subprocess only in this case. Thus, the normal Spark shell doesn't need to go through this case even though it is also a REPL.
Thanks davies for reporting this.
Author: Andrew Or <andrewor14@gmail.com>
Closes #2170 from andrewor14/bootstrap-hotfix and squashes the following commits:
42963f5 [Andrew Or] Do not wait for EOF unless this is the pyspark shell
Diffstat (limited to 'bin/pyspark')
-rwxr-xr-x | bin/pyspark | 2 |
1 files changed, 2 insertions, 0 deletions
diff --git a/bin/pyspark b/bin/pyspark index 59cfdfa7c5..f553b314c5 100755 --- a/bin/pyspark +++ b/bin/pyspark @@ -102,6 +102,8 @@ if [[ "$1" =~ \.py$ ]]; then gatherSparkSubmitOpts "$@" exec $FWDIR/bin/spark-submit "${SUBMISSION_OPTS[@]}" $primary "${APPLICATION_OPTS[@]}" else + # PySpark shell requires special handling downstream + export PYSPARK_SHELL=1 # Only use ipython if no command line arguments were provided [SPARK-1134] if [[ "$IPYTHON" = "1" ]]; then exec ${PYSPARK_PYTHON:-ipython} $IPYTHON_OPTS |