aboutsummaryrefslogtreecommitdiff
path: root/assembly
diff options
context:
space:
mode:
authorMatei Zaharia <matei@databricks.com>2014-05-06 15:12:35 -0700
committerMatei Zaharia <matei@databricks.com>2014-05-06 15:12:35 -0700
commit951a5d939863b42da83ac2569d5e9d7ed680e119 (patch)
tree6ff0c545f577b05a86ce33d339cd0d487e935a38 /assembly
parentec09acdd4a72333e1c9c2e9d8e12e9c4c07770c8 (diff)
downloadspark-951a5d939863b42da83ac2569d5e9d7ed680e119.tar.gz
spark-951a5d939863b42da83ac2569d5e9d7ed680e119.tar.bz2
spark-951a5d939863b42da83ac2569d5e9d7ed680e119.zip
[SPARK-1549] Add Python support to spark-submit
This PR updates spark-submit to allow submitting Python scripts (currently only with deploy-mode=client, but that's all that was supported before) and updates the PySpark code to properly find various paths, etc. One significant change is that we assume we can always find the Python files either from the Spark assembly JAR (which will happen with the Maven assembly build in make-distribution.sh) or from SPARK_HOME (which will exist in local mode even if you use sbt assembly, and should be enough for testing). This means we no longer need a weird hack to modify the environment for YARN. This patch also updates the Python worker manager to run python with -u, which means unbuffered output (send it to our logs right away instead of waiting a while after stuff was written); this should simplify debugging. In addition, it fixes https://issues.apache.org/jira/browse/SPARK-1709, setting the main class from a JAR's Main-Class attribute if not specified by the user, and fixes a few help strings and style issues in spark-submit. In the future we may want to make the `pyspark` shell use spark-submit as well, but it seems unnecessary for 1.0. Author: Matei Zaharia <matei@databricks.com> Closes #664 from mateiz/py-submit and squashes the following commits: 15e9669 [Matei Zaharia] Fix some uses of path.separator property 051278c [Matei Zaharia] Small style fixes 0afe886 [Matei Zaharia] Add license headers 4650412 [Matei Zaharia] Add pyFiles to PYTHONPATH in executors, remove old YARN stuff, add tests 15f8e1e [Matei Zaharia] Set PYTHONPATH in PythonWorkerFactory in case it wasn't set from outside 47c0655 [Matei Zaharia] More work to make spark-submit work with Python: d4375bd [Matei Zaharia] Clean up description of spark-submit args a bit and add Python ones
Diffstat (limited to 'assembly')
-rw-r--r--assembly/pom.xml13
1 files changed, 0 insertions, 13 deletions
diff --git a/assembly/pom.xml b/assembly/pom.xml
index bdb3880649..7d123fb1d7 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -40,14 +40,6 @@
<deb.user>root</deb.user>
</properties>
- <repositories>
- <!-- A repository in the local filesystem for the Py4J JAR, which is not in Maven central -->
- <repository>
- <id>lib</id>
- <url>file://${project.basedir}/lib</url>
- </repository>
- </repositories>
-
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
@@ -84,11 +76,6 @@
<artifactId>spark-sql_${scala.binary.version}</artifactId>
<version>${project.version}</version>
</dependency>
- <dependency>
- <groupId>net.sf.py4j</groupId>
- <artifactId>py4j</artifactId>
- <version>0.8.1</version>
- </dependency>
</dependencies>
<build>