aboutsummaryrefslogtreecommitdiff
path: root/core
diff options
context:
space:
mode:
authorcocoatomo <cocoatomo77@gmail.com>2014-10-02 11:13:19 -0700
committerJosh Rosen <joshrosen@apache.org>2014-10-02 11:13:19 -0700
commit5b4a5b1acdc439a58aa2a3561ac0e3fb09f529d6 (patch)
tree9cc7c9e6c186f7411a3e3a6a0dd515253097271a /core
parent6e27cb630de69fa5acb510b4e2f6b980742b1957 (diff)
downloadspark-5b4a5b1acdc439a58aa2a3561ac0e3fb09f529d6.tar.gz
spark-5b4a5b1acdc439a58aa2a3561ac0e3fb09f529d6.tar.bz2
spark-5b4a5b1acdc439a58aa2a3561ac0e3fb09f529d6.zip
[SPARK-3706][PySpark] Cannot run IPython REPL with IPYTHON set to "1" and PYSPARK_PYTHON unset
### Problem The section "Using the shell" in Spark Programming Guide (https://spark.apache.org/docs/latest/programming-guide.html#using-the-shell) says that we can run pyspark REPL through IPython. But a folloing command does not run IPython but a default Python executable. ``` $ IPYTHON=1 ./bin/pyspark Python 2.7.8 (default, Jul 2 2014, 10:14:46) ... ``` the spark/bin/pyspark script on the commit b235e013638685758885842dc3268e9800af3678 decides which executable and options it use folloing way. 1. if PYSPARK_PYTHON unset * → defaulting to "python" 2. if IPYTHON_OPTS set * → set IPYTHON "1" 3. some python scripts passed to ./bin/pyspak → run it with ./bin/spark-submit * out of this issues scope 4. if IPYTHON set as "1" * → execute $PYSPARK_PYTHON (default: ipython) with arguments $IPYTHON_OPTS * otherwise execute $PYSPARK_PYTHON Therefore, when PYSPARK_PYTHON is unset, python is executed though IPYTHON is "1". In other word, when PYSPARK_PYTHON is unset, IPYTHON_OPS and IPYTHON has no effect on decide which command to use. PYSPARK_PYTHON | IPYTHON_OPTS | IPYTHON | resulting command | expected command ---- | ---- | ----- | ----- | ----- (unset → defaults to python) | (unset) | (unset) | python | (same) (unset → defaults to python) | (unset) | 1 | python | ipython (unset → defaults to python) | an_option | (unset → set to 1) | python an_option | ipython an_option (unset → defaults to python) | an_option | 1 | python an_option | ipython an_option ipython | (unset) | (unset) | ipython | (same) ipython | (unset) | 1 | ipython | (same) ipython | an_option | (unset → set to 1) | ipython an_option | (same) ipython | an_option | 1 | ipython an_option | (same) ### Suggestion The pyspark script should determine firstly whether a user wants to run IPython or other executables. 1. if IPYTHON_OPTS set * set IPYTHON "1" 2. if IPYTHON has a value "1" * PYSPARK_PYTHON defaults to "ipython" if not set 3. PYSPARK_PYTHON defaults to "python" if not set See the pull request for more detailed modification. Author: cocoatomo <cocoatomo77@gmail.com> Closes #2554 from cocoatomo/issues/cannot-run-ipython-without-options and squashes the following commits: d2a9b06 [cocoatomo] [SPARK-3706][PySpark] Use PYTHONUNBUFFERED environment variable instead of -u option 264114c [cocoatomo] [SPARK-3706][PySpark] Remove the sentence about deprecated environment variables 42e02d5 [cocoatomo] [SPARK-3706][PySpark] Replace environment variables used to customize execution of PySpark REPL 10d56fb [cocoatomo] [SPARK-3706][PySpark] Cannot run IPython REPL with IPYTHON set to "1" and PYSPARK_PYTHON unset
Diffstat (limited to 'core')
-rw-r--r--core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala3
1 files changed, 2 insertions, 1 deletions
diff --git a/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala b/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala
index b66c3ba4d5..79b4d7ea41 100644
--- a/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala
@@ -54,9 +54,10 @@ object PythonRunner {
val pythonPath = PythonUtils.mergePythonPaths(pathElements: _*)
// Launch Python process
- val builder = new ProcessBuilder(Seq(pythonExec, "-u", formattedPythonFile) ++ otherArgs)
+ val builder = new ProcessBuilder(Seq(pythonExec, formattedPythonFile) ++ otherArgs)
val env = builder.environment()
env.put("PYTHONPATH", pythonPath)
+ env.put("PYTHONUNBUFFERED", "YES") // value is needed to be set to a non-empty string
env.put("PYSPARK_GATEWAY_PORT", "" + gatewayServer.getListeningPort)
builder.redirectErrorStream(true) // Ugly but needed for stdout and stderr to synchronize
val process = builder.start()