spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	SPARK-1944 Document --verbose in spark-shell -h	Andrew Ash	2014-06-09	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-1944 Author: Andrew Ash <andrew@andrewash.com> Closes #1020 from ash211/SPARK-1944 and squashes the following commits: a831c4d [Andrew Ash] SPARK-1944 Document --verbose in spark-shell -h (cherry picked from commit 35630c86ff0e27862c9d902887eb0a24d25867ae) Signed-off-by: Reynold Xin <rxin@apache.org>
*	[SPARK-2067] use relative path for Spark logo in UI	Neville Li	2014-06-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Author: Neville Li <neville@spotify.com> Closes #1006 from nevillelyh/gh/SPARK-2067 and squashes the following commits: 9ee64cf [Neville Li] [SPARK-2067] use relative path for Spark logo in UI (cherry picked from commit 15ddbef414d5fd6d4672936ba3c747b5fb7ab52b) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	SPARK-2043: ExternalAppendOnlyMap doesn't always find matching keys	Matei Zaharia	2014-06-05	2	-5/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current implementation reads one key with the next hash code as it finishes reading the keys with the current hash code, which may cause it to miss some matches of the next key. This can cause operations like join to give the wrong result when reduce tasks spill to disk and there are hash collisions, as values won't be matched together. This PR fixes it by not reading in that next key, using a peeking iterator instead. Author: Matei Zaharia <matei@databricks.com> Closes #986 from mateiz/spark-2043 and squashes the following commits: 0959514 [Matei Zaharia] Added unit test for having many hash collisions 892debb [Matei Zaharia] SPARK-2043: don't read a key with the next hash code in ExternalAppendOnlyMap, instead use a buffered iterator to only read values with the current hash code. (cherry picked from commit b45c13e7d798f97b92f1a6329528191b8d779c4f) Signed-off-by: Matei Zaharia <matei@databricks.com>
*	SPARK-1677: allow user to disable output dir existence checking	CodingCat	2014-06-05	2	-2/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-1677 For compatibility with older versions of Spark it would be nice to have an option `spark.hadoop.validateOutputSpecs` (default true) for the user to disable the output directory existence checking Author: CodingCat <zhunansjtu@gmail.com> Closes #947 from CodingCat/SPARK-1677 and squashes the following commits: 7930f83 [CodingCat] miao c0c0e03 [CodingCat] bug fix and doc update 5318562 [CodingCat] bug fix 13219b5 [CodingCat] allow user to disable output dir existence checking (cherry picked from commit 89cdbb087cb2f0d03be2dd77440300c6bd61c792) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	SPARK-1518: FileLogger: Fix compile against Hadoop trunk	Colin McCabe	2014-06-04	1	-4/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In Hadoop trunk (currently Hadoop 3.0.0), the deprecated FSDataOutputStream#sync() method has been removed. Instead, we should call FSDataOutputStream#hflush, which does the same thing as the deprecated method used to do. Author: Colin McCabe <cmccabe@cloudera.com> Closes #898 from cmccabe/SPARK-1518 and squashes the following commits: 752b9d7 [Colin McCabe] FileLogger: Fix compile against Hadoop trunk (cherry picked from commit 1765c8d0ddf6bb5bc3c21f994456eba04c581de4) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	Improve maven plugin configuration	witgo	2014-06-01	1	-29/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	Author: witgo <witgo@qq.com> Closes #786 from witgo/maven_plugin and squashes the following commits: 5de86a2 [witgo] Merge branch 'master' of https://github.com/apache/spark into maven_plugin c35ef73 [witgo] Improve maven plugin configuration Conflicts: pom.xml
*	Super minor: Close inputStream in SparkSubmitArguments	Aaron Davidson	2014-05-31	1	-4/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	`Properties#load()` doesn't close the InputStream, but it'd be closed after being GC'd anyway... Also changed file.getName to file, because getName only shows the filename. This will show the full (possibly relative) path, which is less confusing if it's not found. Author: Aaron Davidson <aaron@databricks.com> Closes #914 from aarondav/tiny and squashes the following commits: db9d072 [Aaron Davidson] Super minor: Close inputStream in SparkSubmitArguments (cherry picked from commit 7d52777effd0ff41aed545f53d2ab8de2364a188) Signed-off-by: Reynold Xin <rxin@apache.org>
*	correct tiny comment error	Chen Chao	2014-05-31	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	Author: Chen Chao <crazyjvm@gmail.com> Closes #928 from CrazyJvm/patch-8 and squashes the following commits: 144328b [Chen Chao] correct tiny comment error (cherry picked from commit 9ecc40d3aeff0eb113f16df55f4249d8143f37f1) Signed-off-by: Reynold Xin <rxin@apache.org>
*	[SPARK-1901] worker should make sure executor has exited before updating ↵	Zhen Peng	2014-05-30	1	-8/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	executor's info https://issues.apache.org/jira/browse/SPARK-1901 Author: Zhen Peng <zhenpeng01@baidu.com> Closes #854 from zhpengg/bugfix-worker-kills-executor and squashes the following commits: 21d380b [Zhen Peng] add some error messages 506cea6 [Zhen Peng] add some docs for killProcess() a0b9860 [Zhen Peng] [SPARK-1901] worker should make sure executor has exited before updating executor's info
*	[SPARK-1712]: TaskDescription instance is too big causes Spark to hang	witgo	2014-05-28	4	-8/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Author: witgo <witgo@qq.com> Closes #694 from witgo/SPARK-1712_new and squashes the following commits: 0f52483 [witgo] review commit 83ce29b [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new 52e6752 [witgo] reset test SparkContext 63636b6 [witgo] review commit 44a59ee [witgo] review commit 3b6d48c [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new 926bd6a [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new 9a5cfad [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new 03cc562 [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new b0930b0 [witgo] review commit b1174bd [witgo] merge master f76679b [witgo] merge master 689495d [witgo] fix scala style bug 1d35c3c [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new 062c182 [witgo] fix small bug for code style 0a428cf [witgo] add unit tests 158b2dc [witgo] review commit 4afe71d [witgo] review commit 9e4ffa7 [witgo] review commit 1d35c7d [witgo] fix hang 7965580 [witgo] fix Statement order 0e29eac [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new 3ea1ca1 [witgo] remove duplicate serialize 743a7ad [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new 86e2048 [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new 2a89adc [witgo] SPARK-1712: TaskDescription instance is too big causes Spark to hang (cherry picked from commit 4dbb27b0cf4eb67c92aad2c1158616312f5a54e6) Signed-off-by: Matei Zaharia <matei@databricks.com>
*	bugfix worker DriverStateChanged state should match DriverState.FAILED	lianhuiwang	2014-05-27	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \|	bugfix worker DriverStateChanged state should match DriverState.FAILED Author: lianhuiwang <lianhuiwang09@gmail.com> Closes #864 from lianhuiwang/master and squashes the following commits: 480ce94 [lianhuiwang] address aarondav comments f2b5970 [lianhuiwang] bugfix worker DriverStateChanged state should match DriverState.FAILED
*	SPARK-1932: Fix race conditions in onReceiveCallback and cachedPeers	zsxwing	2014-05-26	2	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	`var cachedPeers: Seq[BlockManagerId] = null` is used in `def replicate(blockId: BlockId, data: ByteBuffer, level: StorageLevel)` without proper protection. There are two place will call `replicate(blockId, bytesAfterPut, level)` * https://github.com/apache/spark/blob/17f3075bc4aa8cbed165f7b367f70e84b1bc8db9/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L644 runs in `connectionManager.futureExecContext` * https://github.com/apache/spark/blob/17f3075bc4aa8cbed165f7b367f70e84b1bc8db9/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L752 `doPut` runs in `connectionManager.handleMessageExecutor`. `org.apache.spark.storage.BlockManagerWorker` calls `blockManager.putBytes` in `connectionManager.handleMessageExecutor`. As they run in different `Executor`s, this is a race condition which may cause the memory pointed by `cachedPeers` is not correct even if `cachedPeers != null`. The race condition of `onReceiveCallback` is that it's set in `BlockManagerWorker` but read in a different thread in `ConnectionManager.handleMessageExecutor`. Author: zsxwing <zsxwing@gmail.com> Closes #887 from zsxwing/SPARK-1932 and squashes the following commits: 524f69c [zsxwing] SPARK-1932: Fix race conditions in onReceiveCallback and cachedPeers (cherry picked from commit 549830b0db2c8b069391224f3a73bb0d7f397f71) Signed-off-by: Aaron Davidson <aaron@databricks.com>
*	SPARK-1933: Throw a more meaningful exception when a directory is passed to ↵	Reynold Xin	2014-05-26	2	-3/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	addJar/addFile. https://issues.apache.org/jira/browse/SPARK-1933 Author: Reynold Xin <rxin@apache.org> Closes #888 from rxin/addfile and squashes the following commits: 8c402a3 [Reynold Xin] Updated comment. ff6c162 [Reynold Xin] SPARK-1933: Throw a more meaningful exception when a directory is passed to addJar/addFile. (cherry picked from commit 90e281b55aecbfbe4431ac582311d5790fe7aad3) Signed-off-by: Reynold Xin <rxin@apache.org>
*	[maven-release-plugin] prepare for next development iteration	Tathagata Das	2014-05-26	1	-1/+1
\|
*	[maven-release-plugin] prepare release v1.0.0-rc11v1.0.0	Tathagata Das	2014-05-26	1	-1/+1
\|
*	Revert "[maven-release-plugin] prepare release v1.0.0-rc11"	Tathagata Das	2014-05-26	1	-1/+1
\| \| \| \|	This reverts commit 2f1dc868e5714882cf40d2633fb66772baf34789.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Tathagata Das	2014-05-26	1	-1/+1
\| \| \| \|	This reverts commit 832dc594e7666f1d402334f8015ce29917d9c888.
*	HOTFIX: Add no-arg SparkContext constructor in Java	Patrick Wendell	2014-05-25	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	Self explanatory. Author: Patrick Wendell <pwendell@gmail.com> Closes #878 from pwendell/java-constructor and squashes the following commits: 2cc1605 [Patrick Wendell] HOTFIX: Add no-arg SparkContext constructor in Java (cherry picked from commit b6d22af040073cd611b0fcfdf8a5259c0dfd854c) Signed-off-by: Aaron Davidson <aaron@databricks.com>
*	[maven-release-plugin] prepare for next development iteration	Tathagata Das	2014-05-25	1	-1/+1
\|
*	[maven-release-plugin] prepare release v1.0.0-rc11	Tathagata Das	2014-05-25	1	-1/+1
\|
*	[SPARK-1886] check executor id existence when executor exit	Zhen Peng	2014-05-24	1	-8/+14
\| \| \| \| \| \| \| \| \| \| \|	Author: Zhen Peng <zhenpeng01@baidu.com> Closes #827 from zhpengg/bugfix-executor-id-not-found and squashes the following commits: cd8bb65 [Zhen Peng] bugfix: check executor id existence when executor exit (cherry picked from commit 4e4831b8facc186cda6ef31040ccdeab48acbbb7) Signed-off-by: Aaron Davidson <aaron@databricks.com>
*	Revert "[maven-release-plugin] prepare release v1.0.0-rc10"	Tathagata Das	2014-05-25	1	-1/+1
\| \| \| \|	This reverts commit d807023479ce10aec28ef3c1ab646ddefc2e663c.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Tathagata Das	2014-05-25	1	-1/+1
\| \| \| \|	This reverts commit 67dd53d2556f03ce292e6889128cf441f1aa48f8.
*	[SPARK-1900 / 1918] PySpark on YARN is broken	Andrew Or	2014-05-24	7	-43/+314
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If I run the following on a YARN cluster ``` bin/spark-submit sheep.py --master yarn-client ``` it fails because of a mismatch in paths: `spark-submit` thinks that `sheep.py` resides on HDFS, and balks when it can't find the file there. A natural workaround is to add the `file:` prefix to the file: ``` bin/spark-submit file:/path/to/sheep.py --master yarn-client ``` However, this also fails. This time it is because python does not understand URI schemes. This PR fixes this by automatically resolving all paths passed as command line argument to `spark-submit` properly. This has the added benefit of keeping file and jar paths consistent across different cluster modes. For python, we strip the URI scheme before we actually try to run it. Much of the code is originally written by @mengxr. Tested on YARN cluster. More tests pending. Author: Andrew Or <andrewor14@gmail.com> Closes #853 from andrewor14/submit-paths and squashes the following commits: 0bb097a [Andrew Or] Format path correctly before adding it to PYTHONPATH 323b45c [Andrew Or] Include --py-files on PYTHONPATH for pyspark shell 3c36587 [Andrew Or] Improve error messages (minor) 854aa6a [Andrew Or] Guard against NPE if user gives pathological paths 6638a6b [Andrew Or] Fix spark-shell jar paths after #849 went in 3bb0359 [Andrew Or] Update more comments (minor) 2a1f8a0 [Andrew Or] Update comments (minor) 6af2c77 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-paths a68c4d1 [Andrew Or] Handle Windows python file path correctly 427a250 [Andrew Or] Resolve paths properly for Windows a591a4a [Andrew Or] Update tests for resolving URIs 6c8621c [Andrew Or] Move resolveURIs to Utils db8255e [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-paths f542dce [Andrew Or] Fix outdated tests 691c4ce [Andrew Or] Ignore special primary resource names 5342ac7 [Andrew Or] Add missing space in error message 02f77f3 [Andrew Or] Resolve command line arguments to spark-submit properly (cherry picked from commit 5081a0a9d47ca31900ea4de570de2cbb0e063105) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
*	Fix UISuite unit test that fails under Jenkins contention	Aaron Davidson	2014-05-22	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Due to perhaps zombie processes on Jenkins, it seems that at least 10 Spark ports are in use. It also doesn't matter that the port increases when used, it could in fact go down -- the only part that matters is that it selects a different port rather than failing to bind. Changed test to match this. Thanks to @andrewor14 for helping diagnose this. Author: Aaron Davidson <aaron@databricks.com> Closes #857 from aarondav/tiny and squashes the following commits: c199ec8 [Aaron Davidson] Fix UISuite unit test that fails under Jenkins contention (cherry picked from commit f9f5fd5f4e81828a3e0c391892e0f28751568843) Signed-off-by: Reynold Xin <rxin@apache.org>
*	[SPARK-1870] Make spark-submit --jars work in yarn-cluster mode.	Xiangrui Meng	2014-05-22	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sent secondary jars to distributed cache of all containers and add the cached jars to classpath before executors start. Tested on a YARN cluster (CDH-5.0). `spark-submit --jars` also works in standalone server and `yarn-client`. Thanks for @andrewor14 for testing! I removed "Doesn't work for drivers in standalone mode with "cluster" deploy mode." from `spark-submit`'s help message, though we haven't tested mesos yet. CC: @dbtsai @sryza Author: Xiangrui Meng <meng@databricks.com> Closes #848 from mengxr/yarn-classpath and squashes the following commits: 23e7df4 [Xiangrui Meng] rename spark.jar to __spark__.jar and app.jar to __app__.jar to avoid confliction apped $CWD/ and $CWD/* to the classpath remove unused methods a40f6ed [Xiangrui Meng] standalone -> cluster 65e04ad [Xiangrui Meng] update spark-submit help message and add a comment for yarn-client 11e5354 [Xiangrui Meng] minor changes 3e7e1c4 [Xiangrui Meng] use sparkConf instead of hadoop conf dc3c825 [Xiangrui Meng] add secondary jars to classpath in yarn (cherry picked from commit dba314029b4c9d72d7e48a2093b39edd01931f57) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
*	[Minor] Move JdbcRDDSuite to the correct package	Andrew Or	2014-05-21	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	It was in the wrong package Author: Andrew Or <andrewor14@gmail.com> Closes #839 from andrewor14/jdbc-suite and squashes the following commits: f948c5a [Andrew Or] cache -> cache() b215279 [Andrew Or] Move JdbcRDDSuite to the correct package (cherry picked from commit 7c79ef7d43de258ad9a5de15c590132bd78ce8dd) Signed-off-by: Reynold Xin <rxin@apache.org>
*	[maven-release-plugin] prepare for next development iteration	Tathagata Das	2014-05-20	1	-1/+1
\|
*	[maven-release-plugin] prepare release v1.0.0-rc10	Tathagata Das	2014-05-20	1	-1/+1
\|
*	Revert "[maven-release-plugin] prepare release v1.0.0-rc9"	Tathagata Das	2014-05-19	1	-1/+1
\| \| \| \|	This reverts commit 920f947eb5a22a679c0c3186cf69ee75f6041c75.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Tathagata Das	2014-05-19	1	-1/+1
\| \| \| \|	This reverts commit f8e611955096c5c1c7db5764b9d2851b1d295f0d.
*	[Spark 1877] ClassNotFoundException when loading RDD with serialized objects	Tathagata Das	2014-05-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Updated version of #821 Author: Tathagata Das <tathagata.das1565@gmail.com> Author: Ghidireac <bogdang@u448a5b0a73d45358d94a.ant.amazon.com> Closes #835 from tdas/SPARK-1877 and squashes the following commits: f346f71 [Tathagata Das] Addressed Patrick's comments. fee0c5d [Ghidireac] SPARK-1877: ClassNotFoundException when loading RDD with serialized objects (cherry picked from commit 52eb54d02403a3c37d84b9da7cc1cdb261048cf8) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
*	SPARK-1689: Spark application should die when removed by Master	Aaron Davidson	2014-05-19	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	scheduler.error() will mask the error if there are active tasks. Being removed is a cataclysmic event for Spark applications, and should probably be treated as such. Author: Aaron Davidson <aaron@databricks.com> Closes #832 from aarondav/i-love-u and squashes the following commits: 9f1200f [Aaron Davidson] SPARK-1689: Spark application should die when removed by Master (cherry picked from commit b0ce22e071da4cc62ec5e29abf7b1299b8e4a6b0) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
*	SPARK-1879. Increase MaxPermSize since some of our builds have many classes	Matei Zaharia	2014-05-19	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	See https://issues.apache.org/jira/browse/SPARK-1879 -- builds with Hadoop2 and Hive ran out of PermGen space in spark-shell, when those things added up with the Scala compiler. Note that users can still override it by setting their own Java options with this change. Their options will come later in the command string than the -XX:MaxPermSize=128m. Author: Matei Zaharia <matei@databricks.com> Closes #823 from mateiz/spark-1879 and squashes the following commits: 6bc0ee8 [Matei Zaharia] Increase MaxPermSize to 128m since some of our builds have lots of classes (cherry picked from commit 5af99d7617ba3b9fbfdb345ef9571b7dd41f45a1) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
*	[SPARK-1876] Windows fixes to deal with latest distribution layout changes	Matei Zaharia	2014-05-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Look for JARs in the right place - Launch examples the same way as on Unix - Load datanucleus JARs if they exist - Don't attempt to parse local paths as URIs in SparkSubmit, since paths with C:\ are not valid URIs - Also fixed POM exclusion rules for datanucleus (it wasn't properly excluding it, whereas SBT was) Author: Matei Zaharia <matei@databricks.com> Closes #819 from mateiz/win-fixes and squashes the following commits: d558f96 [Matei Zaharia] Fix comment 228577b [Matei Zaharia] Review comments d3b71c7 [Matei Zaharia] Properly exclude datanucleus files in Maven assembly 144af84 [Matei Zaharia] Update Windows scripts to match latest binary package layout (cherry picked from commit 7b70a7071894dd90ea1d0091542b3e13e7ef8d3a) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-05-17	1	-1/+1
\|
*	[maven-release-plugin] prepare release v1.0.0-rc9	Patrick Wendell	2014-05-17	1	-1/+1
\|
*	Revert "[maven-release-plugin] prepare release v1.0.0-rc8"	Patrick Wendell	2014-05-16	1	-1/+1
\| \| \| \|	This reverts commit 80eea0f111c06260ffaa780d2f3f7facd09c17bc.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Patrick Wendell	2014-05-16	1	-1/+1
\| \| \| \|	This reverts commit e5436b8c1a79ce108f3af402455ac5f6dc5d1eb3.
*	Make deprecation warning less severe	Patrick Wendell	2014-05-16	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	Just a small change. I think it's good not to scare people who are using the old options. Author: Patrick Wendell <pwendell@gmail.com> Closes #810 from pwendell/warnings and squashes the following commits: cb8a311 [Patrick Wendell] Make deprecation warning less severe (cherry picked from commit 442808a7482b81c8de887c901b424683da62022e) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[SPARK-1808] Route bin/pyspark through Spark submit	Andrew Or	2014-05-16	4	-18/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem. For `bin/pyspark`, there is currently no other way to specify Spark configuration properties other than through `SPARK_JAVA_OPTS` in `conf/spark-env.sh`. However, this mechanism is supposedly deprecated. Instead, it needs to pick up configurations explicitly specified in `conf/spark-defaults.conf`. Solution. Have `bin/pyspark` invoke `bin/spark-submit`, like all of its counterparts in Scala land (i.e. `bin/spark-shell`, `bin/run-example`). This has the additional benefit of making the invocation of all the user facing Spark scripts consistent. Details. `bin/pyspark` inherently handles two cases: (1) running python applications and (2) running the python shell. For (1), Spark submit already handles running python applications. For cases in which `bin/pyspark` is given a python file, we can simply call pass the file directly to Spark submit and let it handle the rest. For case (2), `bin/pyspark` starts a python process as before, which launches the JVM as a sub-process. The existing code already provides a code path to do this. All we needed to change is to use `bin/spark-submit` instead of `spark-class` to launch the JVM. This requires modifications to Spark submit to handle the pyspark shell as a special case. This has been tested locally (OSX and Windows 7), on a standalone cluster, and on a YARN cluster. Running IPython also works as before, except now it takes in Spark submit arguments too. Author: Andrew Or <andrewor14@gmail.com> Closes #799 from andrewor14/pyspark-submit and squashes the following commits: bf37e36 [Andrew Or] Minor changes 01066fa [Andrew Or] bin/pyspark for Windows c8cb3bf [Andrew Or] Handle perverse app names (with escaped quotes) 1866f85 [Andrew Or] Windows is not cooperating 456d844 [Andrew Or] Guard against shlex hanging if PYSPARK_SUBMIT_ARGS is not set 7eebda8 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit b7ba0d8 [Andrew Or] Address a few comments (minor) 06eb138 [Andrew Or] Use shlex instead of writing our own parser 05879fa [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit a823661 [Andrew Or] Fix --die-on-broken-pipe not propagated properly 6fba412 [Andrew Or] Deal with quotes + address various comments fe4c8a7 [Andrew Or] Update --help for bin/pyspark afe47bf [Andrew Or] Fix spark shell f04aaa4 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit a371d26 [Andrew Or] Route bin/pyspark through Spark submit (cherry picked from commit 4b8ec6fcfd7a7ef0857d5b21917183c181301c95) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	SPARK-1864 Look in spark conf instead of system properties when propagating ↵	Michael Armbrust	2014-05-16	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	configuration to executors. Author: Michael Armbrust <michael@databricks.com> Closes #808 from marmbrus/confClasspath and squashes the following commits: 4c31d57 [Michael Armbrust] Look in spark conf instead of system properties when propagating configuration to executors. (cherry picked from commit a80a6a139e729ee3f81ec4f0028e084d2d9f7e82) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-05-16	1	-1/+1
\|
*	[maven-release-plugin] prepare release v1.0.0-rc8	Patrick Wendell	2014-05-16	1	-1/+1
\|
*	Revert "[maven-release-plugin] prepare release v1.0.0-rc7"	Patrick Wendell	2014-05-16	1	-1/+1
\| \| \| \|	This reverts commit 9212b3e5bb5545ccfce242da8d89108e6fb1c464.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Patrick Wendell	2014-05-16	1	-1/+1
\| \| \| \|	This reverts commit c4746aa6fe4aaf383e69e34353114d36d1eb9ba6.
*	SPARK-1860: Do not cleanup application work/ directories by default	Aaron Davidson	2014-05-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This causes an unrecoverable error for applications that are running for longer than 7 days that have jars added to the SparkContext, as the jars are cleaned up even though the application is still running. Author: Aaron Davidson <aaron@databricks.com> Closes #800 from aarondav/shitty-defaults and squashes the following commits: a573fbb [Aaron Davidson] SPARK-1860: Do not cleanup application work/ directories by default (cherry picked from commit bb98ecafce196ecc5bc3a1e4cc9264df7b752c6a) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	Typos in Spark	Huajian Mao	2014-05-15	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Author: Huajian Mao <huajianmao@gmail.com> Closes #798 from huajianmao/patch-1 and squashes the following commits: 208a454 [Huajian Mao] A typo in Task 1b515af [Huajian Mao] A typo in the message (cherry picked from commit 94c5139607ec876782e594012a108ebf55fa97db) Signed-off-by: Reynold Xin <rxin@apache.org>
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-05-15	1	-1/+1
\|
*	[maven-release-plugin] prepare release v1.0.0-rc7	Patrick Wendell	2014-05-15	1	-1/+1
\|