spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SPARK-1808] Route bin/pyspark through Spark submit	Andrew Or	2014-05-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem. For `bin/pyspark`, there is currently no other way to specify Spark configuration properties other than through `SPARK_JAVA_OPTS` in `conf/spark-env.sh`. However, this mechanism is supposedly deprecated. Instead, it needs to pick up configurations explicitly specified in `conf/spark-defaults.conf`. Solution. Have `bin/pyspark` invoke `bin/spark-submit`, like all of its counterparts in Scala land (i.e. `bin/spark-shell`, `bin/run-example`). This has the additional benefit of making the invocation of all the user facing Spark scripts consistent. Details. `bin/pyspark` inherently handles two cases: (1) running python applications and (2) running the python shell. For (1), Spark submit already handles running python applications. For cases in which `bin/pyspark` is given a python file, we can simply call pass the file directly to Spark submit and let it handle the rest. For case (2), `bin/pyspark` starts a python process as before, which launches the JVM as a sub-process. The existing code already provides a code path to do this. All we needed to change is to use `bin/spark-submit` instead of `spark-class` to launch the JVM. This requires modifications to Spark submit to handle the pyspark shell as a special case. This has been tested locally (OSX and Windows 7), on a standalone cluster, and on a YARN cluster. Running IPython also works as before, except now it takes in Spark submit arguments too. Author: Andrew Or <andrewor14@gmail.com> Closes #799 from andrewor14/pyspark-submit and squashes the following commits: bf37e36 [Andrew Or] Minor changes 01066fa [Andrew Or] bin/pyspark for Windows c8cb3bf [Andrew Or] Handle perverse app names (with escaped quotes) 1866f85 [Andrew Or] Windows is not cooperating 456d844 [Andrew Or] Guard against shlex hanging if PYSPARK_SUBMIT_ARGS is not set 7eebda8 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit b7ba0d8 [Andrew Or] Address a few comments (minor) 06eb138 [Andrew Or] Use shlex instead of writing our own parser 05879fa [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit a823661 [Andrew Or] Fix --die-on-broken-pipe not propagated properly 6fba412 [Andrew Or] Deal with quotes + address various comments fe4c8a7 [Andrew Or] Update --help for bin/pyspark afe47bf [Andrew Or] Fix spark shell f04aaa4 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit a371d26 [Andrew Or] Route bin/pyspark through Spark submit
*	Fixed broken pyspark shell.	Reynold Xin	2014-04-18	1	-2/+2
\| \| \| \| \| \| \| \| \|	Author: Reynold Xin <rxin@apache.org> Closes #444 from rxin/pyspark and squashes the following commits: fc11356 [Reynold Xin] Made the PySpark shell version checking compatible with Python 2.6. 571830b [Reynold Xin] Fixed broken pyspark shell.
*	[python alternative] pyspark require Python2, failing if system default is ↵	AbhishekKr	2014-04-16	1	-6/+14
\| \| \| \| \| \| \| \| \| \| \| \|	Py3 from shell.py Python alternative for https://github.com/apache/spark/pull/392; managed from shell.py Author: AbhishekKr <abhikumar163@gmail.com> Closes #399 from abhishekkr/pyspark_shell and squashes the following commits: 134bdc9 [AbhishekKr] pyspark require Python2, failing if system default is Py3 from shell.py
*	Set spark.executor.uri from environment variable (needed by Mesos)	Ivan Wick	2014-04-10	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	The Mesos backend uses this property when setting up a slave process. It is similarly set in the Scala repl (org.apache.spark.repl.SparkILoop), but I couldn't find any analogous for pyspark. Author: Ivan Wick <ivanwick+github@gmail.com> This patch had conflicts when merged, resolved by Committer: Matei Zaharia <matei@databricks.com> Closes #311 from ivanwick/master and squashes the following commits: da0c3e4 [Ivan Wick] Set spark.executor.uri from environment variable (needed by Mesos)
*	SPARK-1099: Introduce local[*] mode to infer number of cores	Aaron Davidson	2014-04-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	This is the default mode for running spark-shell and pyspark, intended to allow users running spark for the first time to see the performance benefits of using multiple cores, while not breaking backwards compatibility for users who use "local" mode and expect exactly 1 core. Author: Aaron Davidson <aaron@databricks.com> Closes #182 from aarondav/110 and squashes the following commits: a88294c [Aaron Davidson] Rebased changes for new spark-shell a9f393e [Aaron Davidson] SPARK-1099: Introduce local[*] mode to infer number of cores
*	Merge pull request #542 from markhamstra/versionBump. Closes #542.	Mark Hamstra	2014-02-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Version number to 1.0.0-SNAPSHOT Since 0.9.0-incubating is done and out the door, we shouldn't be building 0.9.0-incubating-SNAPSHOT anymore. @pwendell Author: Mark Hamstra <markhamstra@gmail.com> == Merge branch commits == commit 1b00a8a7c1a7f251b4bb3774b84b9e64758eaa71 Author: Mark Hamstra <markhamstra@gmail.com> Date: Wed Feb 5 09:30:32 2014 -0800 Version number to 1.0.0-SNAPSHOT
*	pyspark -> bin/pyspark	Prashant Sharma	2014-01-02	1	-1/+1
\|
*	Typo: avaiable -> available	Andrew Ash	2013-12-24	1	-1/+1
\|
*	Update build version in master	Patrick Wendell	2013-09-24	1	-1/+1
\|
*	Export StorageLevel and refactor	Aaron Davidson	2013-09-07	1	-1/+1
\|
*	Remove reflection, hard-code StorageLevels	Aaron Davidson	2013-09-07	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	The sc.StorageLevel -> StorageLevel pathway is a bit janky, but otherwise the shell would have to call a private method of SparkContext. Having StorageLevel available in sc also doesn't seem like the end of the world. There may be a better solution, though. As for creating the StorageLevel object itself, this seems to be the best way in Python 2 for creating singleton, enum-like objects: http://stackoverflow.com/questions/36932/how-can-i-represent-an-enum-in-python
*	SPARK-660: Add StorageLevel support in Python	Aaron Davidson	2013-09-05	1	-1/+2
\| \| \| \| \|	It uses reflection... I am not proud of that fact, but it at least ensures compatibility (sans refactoring of the StorageLevel stuff).
*	Add banner to PySpark and make wordcount output nicer	Matei Zaharia	2013-09-01	1	-0/+13
\|
*	Merge pull request #813 from AndreSchumacher/add_files_pyspark	Matei Zaharia	2013-08-12	1	-1/+6
\|\ \| \| \| \|	Implementing SPARK-865: Add the equivalent of ADD_JARS to PySpark
\| *	Implementing SPARK-865: Add the equivalent of ADD_JARS to PySpark	Andre Schumacher	2013-08-12	1	-1/+6
\| \| \| \| \| \| \| \|	Now ADD_FILES uses a comma as file name separator.
* \|	Add Apache license headers and LICENSE and NOTICE files	Matei Zaharia	2013-07-16	1	-0/+17
\|/
*	Make module help available in python shell.	Patrick Wendell	2013-01-30	1	-0/+1
\| \| \| \|	Also, adds a line in doc explaining how to use.
*	Added accumulators to PySpark	Matei Zaharia	2013-01-20	1	-2/+2
\|
*	Add `pyspark` script to replace the other scripts.	Josh Rosen	2013-01-01	1	-26/+10
\| \| \|	Expand the PySpark programming guide.
*	Rename top-level 'pyspark' directory to 'python'	Josh Rosen	2013-01-01	1	-0/+33