From 5b4a5b1acdc439a58aa2a3561ac0e3fb09f529d6 Mon Sep 17 00:00:00 2001 From: cocoatomo Date: Thu, 2 Oct 2014 11:13:19 -0700 Subject: [SPARK-3706][PySpark] Cannot run IPython REPL with IPYTHON set to "1" and PYSPARK_PYTHON unset MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ### Problem The section "Using the shell" in Spark Programming Guide (https://spark.apache.org/docs/latest/programming-guide.html#using-the-shell) says that we can run pyspark REPL through IPython. But a folloing command does not run IPython but a default Python executable. ``` $ IPYTHON=1 ./bin/pyspark Python 2.7.8 (default, Jul 2 2014, 10:14:46) ... ``` the spark/bin/pyspark script on the commit b235e013638685758885842dc3268e9800af3678 decides which executable and options it use folloing way. 1. if PYSPARK_PYTHON unset * → defaulting to "python" 2. if IPYTHON_OPTS set * → set IPYTHON "1" 3. some python scripts passed to ./bin/pyspak → run it with ./bin/spark-submit * out of this issues scope 4. if IPYTHON set as "1" * → execute $PYSPARK_PYTHON (default: ipython) with arguments $IPYTHON_OPTS * otherwise execute $PYSPARK_PYTHON Therefore, when PYSPARK_PYTHON is unset, python is executed though IPYTHON is "1". In other word, when PYSPARK_PYTHON is unset, IPYTHON_OPS and IPYTHON has no effect on decide which command to use. PYSPARK_PYTHON | IPYTHON_OPTS | IPYTHON | resulting command | expected command ---- | ---- | ----- | ----- | ----- (unset → defaults to python) | (unset) | (unset) | python | (same) (unset → defaults to python) | (unset) | 1 | python | ipython (unset → defaults to python) | an_option | (unset → set to 1) | python an_option | ipython an_option (unset → defaults to python) | an_option | 1 | python an_option | ipython an_option ipython | (unset) | (unset) | ipython | (same) ipython | (unset) | 1 | ipython | (same) ipython | an_option | (unset → set to 1) | ipython an_option | (same) ipython | an_option | 1 | ipython an_option | (same) ### Suggestion The pyspark script should determine firstly whether a user wants to run IPython or other executables. 1. if IPYTHON_OPTS set * set IPYTHON "1" 2. if IPYTHON has a value "1" * PYSPARK_PYTHON defaults to "ipython" if not set 3. PYSPARK_PYTHON defaults to "python" if not set See the pull request for more detailed modification. Author: cocoatomo Closes #2554 from cocoatomo/issues/cannot-run-ipython-without-options and squashes the following commits: d2a9b06 [cocoatomo] [SPARK-3706][PySpark] Use PYTHONUNBUFFERED environment variable instead of -u option 264114c [cocoatomo] [SPARK-3706][PySpark] Remove the sentence about deprecated environment variables 42e02d5 [cocoatomo] [SPARK-3706][PySpark] Replace environment variables used to customize execution of PySpark REPL 10d56fb [cocoatomo] [SPARK-3706][PySpark] Cannot run IPython REPL with IPYTHON set to "1" and PYSPARK_PYTHON unset --- bin/pyspark | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) (limited to 'bin') diff --git a/bin/pyspark b/bin/pyspark index 5142411e36..6655725ef8 100755 --- a/bin/pyspark +++ b/bin/pyspark @@ -52,10 +52,20 @@ fi # Figure out which Python executable to use if [[ -z "$PYSPARK_PYTHON" ]]; then - PYSPARK_PYTHON="python" + if [[ "$IPYTHON" = "1" || -n "$IPYTHON_OPTS" ]]; then + # for backward compatibility + PYSPARK_PYTHON="ipython" + else + PYSPARK_PYTHON="python" + fi fi export PYSPARK_PYTHON +if [[ -z "$PYSPARK_PYTHON_OPTS" && -n "$IPYTHON_OPTS" ]]; then + # for backward compatibility + PYSPARK_PYTHON_OPTS="$IPYTHON_OPTS" +fi + # Add the PySpark classes to the Python path: export PYTHONPATH="$SPARK_HOME/python/:$PYTHONPATH" export PYTHONPATH="$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH" @@ -64,11 +74,6 @@ export PYTHONPATH="$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH" export OLD_PYTHONSTARTUP="$PYTHONSTARTUP" export PYTHONSTARTUP="$FWDIR/python/pyspark/shell.py" -# If IPython options are specified, assume user wants to run IPython -if [[ -n "$IPYTHON_OPTS" ]]; then - IPYTHON=1 -fi - # Build up arguments list manually to preserve quotes and backslashes. # We export Spark submit arguments as an environment variable because shell.py must run as a # PYTHONSTARTUP script, which does not take in arguments. This is required for IPython notebooks. @@ -106,10 +111,5 @@ if [[ "$1" =~ \.py$ ]]; then else # PySpark shell requires special handling downstream export PYSPARK_SHELL=1 - # Only use ipython if no command line arguments were provided [SPARK-1134] - if [[ "$IPYTHON" = "1" ]]; then - exec ${PYSPARK_PYTHON:-ipython} $IPYTHON_OPTS - else - exec "$PYSPARK_PYTHON" - fi + exec "$PYSPARK_PYTHON" $PYSPARK_PYTHON_OPTS fi -- cgit v1.2.3