spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	BUILD: Adding back CDH4 as per user requests	Patrick Wendell	2014-08-29	1	-0/+1
\|
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-08-30	24	-25/+25
\|
*	[maven-release-plugin] prepare release v1.1.0-rc3	Patrick Wendell	2014-08-30	24	-38/+33
\|
*	Adding new CHANGES.txt	Patrick Wendell	2014-08-29	1	-0/+45
\|
*	[SPARK-3320][SQL] Made batched in-memory column buffer building work for ↵	Cheng Lian	2014-08-29	3	-34/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	SchemaRDDs with empty partitions Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #2213 from liancheng/spark-3320 and squashes the following commits: 45a0139 [Cheng Lian] Fixed typo in InMemoryColumnarQuerySuite f67067d [Cheng Lian] Fixed SPARK-3320 (cherry picked from commit 32b18dd52cf8920903819f23e406271ecd8ac6bb) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-3296][mllib] spark-example should be run-example in head notation of ↵	wangfei	2014-08-29	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	DenseKMeans and SparseNaiveBayes `./bin/spark-example` should be `./bin/run-example` in DenseKMeans and SparseNaiveBayes Author: wangfei <wangfei_hello@126.com> Closes #2193 from scwf/run-example and squashes the following commits: 207eb3a [wangfei] spark-example should be run-example 27a8999 [wangfei] ./bin/spark-example should be ./bin/run-example (cherry picked from commit 13901764f4e9ed3de03e420d88ab42bdce5d5140) Signed-off-by: Xiangrui Meng <meng@databricks.com>
*	Revert "[maven-release-plugin] prepare release v1.1.0-rc2"	Patrick Wendell	2014-08-29	24	-33/+38
\| \| \| \|	This reverts commit 711aebb329ca28046396af1e34395a0df92b5327.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Patrick Wendell	2014-08-29	24	-25/+25
\| \| \| \|	This reverts commit a4a7a241441489a0d31365e18476ae2e1c34464d.
*	[SPARK-3291][SQL]TestcaseName in createQueryTest should not contain ":"	qiping.lqp	2014-08-29	3	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	":" is not allowed to appear in a file name of Windows system. If file name contains ":", this file can't be checked out in a Windows system and developers using Windows must be careful to not commit the deletion of such files, Which is very inconvenient. Author: qiping.lqp <qiping.lqp@alibaba-inc.com> Closes #2191 from chouqin/querytest and squashes the following commits: 0e943a1 [qiping.lqp] rename golden file 60a863f [qiping.lqp] TestcaseName in createQueryTest should not contain ":" (cherry picked from commit 634d04b87c2744d645e9c26e746ba2006371d9b5) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-3269][SQL] Decreases initial buffer size for row set to prevent OOM	Cheng Lian	2014-08-29	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	When a large batch size is specified, `SparkSQLOperationManager` OOMs even if the whole result set is much smaller than the batch size. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #2171 from liancheng/jdbc-fetch-size and squashes the following commits: 5e1623b [Cheng Lian] Decreases initial buffer size for row set to prevent OOM (cherry picked from commit d94a44d7caaf3fe7559d9ad7b10872fa16cf81ca) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-3234][Build] Fixed environment variables that rely on deprecated ↵	Cheng Lian	2014-08-29	1	-1/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	command line options in make-distribution.sh Please refer to [SPARK-3234](https://issues.apache.org/jira/browse/SPARK-3234) for details. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #2208 from liancheng/spark-3234 and squashes the following commits: fb26de8 [Cheng Lian] Fixed SPARK-3234 (cherry picked from commit 287c0ac7722dd4bc51b921ccc6f0e3c1625b5ff4) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[Docs] SQL doc formatting and typo fixes	Nicholas Chammas	2014-08-29	2	-59/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As [reported on the dev list](http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-1-0-RC2-tp8107p8131.html): * Code fencing with triple-backticks doesn’t seem to work like it does on GitHub. Newlines are lost. Instead, use 4-space indent to format small code blocks. * Nested bullets need 2 leading spaces, not 1. * Spellcheck! Author: Nicholas Chammas <nicholas.chammas@gmail.com> Author: nchammas <nicholas.chammas@gmail.com> Closes #2201 from nchammas/sql-doc-fixes and squashes the following commits: 873f889 [Nicholas Chammas] [Docs] fix skip-api flag 5195e0c [Nicholas Chammas] [Docs] SQL doc formatting and typo fixes 3b26c8d [nchammas] [Spark QA] Link to console output on test time out (cherry picked from commit 53aa8316e88980c6f46d3b9fc90d935a4738a370) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-3307] [PySpark] Fix doc string of SparkContext.broadcast()	Davies Liu	2014-08-29	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	remove invalid docs Author: Davies Liu <davies.liu@gmail.com> Closes #2202 from davies/keep and squashes the following commits: aa3b44f [Davies Liu] remove invalid docs (cherry picked from commit e248328b39f52073422a12fd0388208de41be1c7) Signed-off-by: Josh Rosen <joshrosen@apache.org>
*	HOTFIX: Bump spark-ec2 version to 1.1.0	Patrick Wendell	2014-08-29	1	-1/+1
\|
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-08-29	24	-25/+25
\|
*	[maven-release-plugin] prepare release v1.1.0-rc2	Patrick Wendell	2014-08-29	24	-38/+33
\|
*	Revert "[maven-release-plugin] prepare release v1.1.0-rc1"	Patrick Wendell	2014-08-28	24	-33/+38
\| \| \| \|	This reverts commit f07183249b74dd857069028bf7d570b35f265585.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Patrick Wendell	2014-08-28	24	-25/+25
\| \| \| \|	This reverts commit f8f7a0c9dce764ece8acdc41d35bbf448dba7e92.
*	Adding new CHANGES.txt	Patrick Wendell	2014-08-28	1	-0/+30
\|
*	[SPARK-3277] Fix external spilling with LZ4 assertion error	Andrew Or	2014-08-28	5	-96/+144
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary of the changes The bulk of this PR is comprised of tests and documentation; the actual fix is really just adding 1 line of code (see `BlockObjectWriter.scala`). We currently do not run the `External` test suites with different compression codecs, and this would have caught the bug reported in [SPARK-3277](https://issues.apache.org/jira/browse/SPARK-3277). This PR extends the existing code to test spilling using all compression codecs known to Spark, including `LZ4`. The bug itself* In `DiskBlockObjectWriter`, we only report the shuffle bytes written before we close the streams. With `LZ4`, all the bytes written reported by our metrics were 0 because `flush()` was not taking effect for some reason. In general, compression codecs may write additional bytes to the file after we call `close()`, and so we must also capture those bytes in our shuffle write metrics. Thanks mridulm and pwendell for help with debugging. Author: Andrew Or <andrewor14@gmail.com> Author: Patrick Wendell <pwendell@gmail.com> Closes #2187 from andrewor14/fix-lz4-spilling and squashes the following commits: 1b54bdc [Andrew Or] Speed up tests by not compressing everything 1c4624e [Andrew Or] Merge branch 'master' of github.com:apache/spark into fix-lz4-spilling 6b2e7d1 [Andrew Or] Fix compilation error 92e251b [Patrick Wendell] Better documentation for BlockObjectWriter. a1ad536 [Andrew Or] Fix tests 089593f [Andrew Or] Actually fix SPARK-3277 (tests still fail) 4bbcf68 [Andrew Or] Update tests to actually test all compression codecs b264a84 [Andrew Or] ExternalAppendOnlyMapSuite code style fixes (minor) 1bfa743 [Andrew Or] Add more information to assert for better debugging
*	SPARK-3082. yarn.Client.logClusterResourceDetails throws NPE if requeste...	Sandy Ryza	2014-08-28	2	-20/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	...d queue doesn't exist Author: Sandy Ryza <sandy@cloudera.com> Closes #1984 from sryza/sandy-spark-3082 and squashes the following commits: fe08c37 [Sandy Ryza] Remove log message entirely 85253ad [Sandy Ryza] SPARK-3082. yarn.Client.logClusterResourceDetails throws NPE if requested queue doesn't exist (cherry picked from commit 92af2314f27e80227174499f2fca505bd551cda7) Signed-off-by: Andrew Or <andrewor14@gmail.com>
*	[SPARK-3190] Avoid overflow in VertexRDD.count()	Ankur Dave	2014-08-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	VertexRDDs with more than 4 billion elements are counted incorrectly due to integer overflow when summing partition sizes. This PR fixes the issue by converting partition sizes to Longs before summing them. The following code previously returned -10000000. After applying this PR, it returns the correct answer of 5000000000 (5 billion). ```scala val pairs = sc.parallelize(0L until 500L).map(_ * 10000000) .flatMap(start => start until (start + 10000000)).map(x => (x, x)) VertexRDD(pairs).count() ``` Author: Ankur Dave <ankurdave@gmail.com> Closes #2106 from ankurdave/SPARK-3190 and squashes the following commits: 641f468 [Ankur Dave] Avoid overflow in VertexRDD.count() (cherry picked from commit 96df92906978c5f58e0cc8ff5eebe5b35a08be3b) Signed-off-by: Josh Rosen <joshrosen@apache.org>
*	[SPARK-3264] Allow users to set executor Spark home in Mesos	Andrew Or	2014-08-28	3	-8/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The executors and the driver may not share the same Spark home. There is currently one way to set the executor side Spark home in Mesos, through setting `spark.home`. However, this is neither documented nor intuitive. This PR adds a more specific config `spark.mesos.executor.home` and exposes this to the user. liancheng tnachen Author: Andrew Or <andrewor14@gmail.com> Closes #2166 from andrewor14/mesos-spark-home and squashes the following commits: b87965e [Andrew Or] Merge branch 'master' of github.com:apache/spark into mesos-spark-home f6abb2e [Andrew Or] Document spark.mesos.executor.home ca7846d [Andrew Or] Add more specific configuration for executor Spark home in Mesos (cherry picked from commit 41dc5987d9abeca6fc0f5935c780d48f517cdf95) Signed-off-by: Andrew Or <andrewor14@gmail.com>
*	[SPARK-3150] Fix NullPointerException in in Spark recovery: Add initializing ↵	Tatiana Borisova	2014-08-28	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	default values in DriverInfo.init() The issue happens when Spark is run standalone on a cluster. When master and driver fall simultaneously on one node in a cluster, master tries to recover its state and restart spark driver. While restarting driver, it falls with NPE exception (stacktrace is below). After falling, it restarts and tries to recover its state and restart Spark driver again. It happens over and over in an infinite cycle. Namely, Spark tries to read DriverInfo state from zookeeper, but after reading it happens to be null in DriverInfo.worker. https://issues.apache.org/jira/browse/SPARK-3150 Author: Tatiana Borisova <tanyatik@yandex.ru> Closes #2062 from tanyatik/spark-3150 and squashes the following commits: 9936043 [Tatiana Borisova] Add initializing default values in DriverInfo.init() (cherry picked from commit 70d814665baa8b8ca868d3126452105ecfa5cbff) Signed-off-by: Josh Rosen <joshrosen@apache.org>
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-08-28	24	-25/+25
\|
*	[maven-release-plugin] prepare release v1.1.0-rc1	Patrick Wendell	2014-08-28	24	-38/+33
\|
*	Revert "[maven-release-plugin] prepare release v1.1.0-rc1"	Patrick Wendell	2014-08-28	24	-33/+38
\| \| \| \|	This reverts commit 58b0be6a29eab817d350729710345e9f39e4c506.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Patrick Wendell	2014-08-28	24	-25/+25
\| \| \| \|	This reverts commit 78e3c036eee7113b2ed144eec5061e070b479e56.
*	Revert "[maven-release-plugin] prepare release v1.1.0-rc1"	Patrick Wendell	2014-08-28	24	-25/+25
\| \| \| \|	This reverts commit 79e86ef3e1a3ee03a7e3b166a5c7dee11c6d60d7.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Patrick Wendell	2014-08-28	24	-25/+25
\| \| \| \|	This reverts commit a118ea5c59d653f5a3feda21455ba60bc722b3b1.
*	Revert "Revert "[maven-release-plugin] prepare for next development iteration""	Patrick Wendell	2014-08-28	24	-25/+25
\| \| \| \|	This reverts commit 71ec0140f7e121bdba3d19e8219e91a5e9d1e320.
*	Revert "Revert "[maven-release-plugin] prepare release v1.1.0-rc1""	Patrick Wendell	2014-08-28	24	-25/+25
\| \| \| \|	This reverts commit 56070f12f455bae645cba887a74c72b12f1085f8.
*	Revert "[maven-release-plugin] prepare release v1.1.0-rc1"	Patrick Wendell	2014-08-28	24	-25/+25
\| \| \| \|	This reverts commit da4b94c86c9dd0d624b3040aa4b9449be9f60fc3.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Patrick Wendell	2014-08-28	24	-25/+25
\| \| \| \|	This reverts commit 96926c5a42c5970ed74c50db5bd9c68cacf92207.
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-08-28	24	-25/+25
\|
*	[maven-release-plugin] prepare release v1.1.0-rc1	Patrick Wendell	2014-08-28	24	-25/+25
\|
*	Additional CHANGES.txt	Patrick Wendell	2014-08-28	1	-0/+30
\|
*	Revert "[maven-release-plugin] prepare release v1.1.0-rc1"	Patrick Wendell	2014-08-28	24	-25/+25
\| \| \| \|	This reverts commit 79e86ef3e1a3ee03a7e3b166a5c7dee11c6d60d7.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Patrick Wendell	2014-08-28	24	-25/+25
\| \| \| \|	This reverts commit a118ea5c59d653f5a3feda21455ba60bc722b3b1.
*	[SPARK-3230][SQL] Fix udfs that return structs	Michael Armbrust	2014-08-28	4	-12/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We need to convert the case classes into Rows. Author: Michael Armbrust <michael@databricks.com> Closes #2133 from marmbrus/structUdfs and squashes the following commits: 189722f [Michael Armbrust] Merge remote-tracking branch 'origin/master' into structUdfs 8e29b1c [Michael Armbrust] Use existing function d8d0b76 [Michael Armbrust] Fix udfs that return structs (cherry picked from commit 76e3ba4264c4a0bc2c33ae6ac862fc40bc302d83) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SQL] Fixed 2 comment typos in SQLConf	Cheng Lian	2014-08-28	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \|	Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #2172 from liancheng/sqlconf-typo and squashes the following commits: 115cc71 [Cheng Lian] Fixed 2 comment typos in SQLConf (cherry picked from commit 68f75dcdfe7e8ab229b73824692c4b3d4c39946c) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-08-28	24	-25/+25
\|
*	[maven-release-plugin] prepare release v1.1.0-rc1	Patrick Wendell	2014-08-28	24	-25/+25
\|
*	HOTFIX: Don't build with YARN support for Mapr3	Patrick Wendell	2014-08-27	1	-1/+1
\|
*	[HOTFIX][SQL] Remove cleaning of UDFs	Michael Armbrust	2014-08-27	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	It is not safe to run the closure cleaner on slaves. #2153 introduced this which broke all UDF execution on slaves. Will re-add cleaning of UDF closures in a follow-up PR. Author: Michael Armbrust <michael@databricks.com> Closes #2174 from marmbrus/fixUdfs and squashes the following commits: 55406de [Michael Armbrust] [HOTFIX] Remove cleaning of UDFs (cherry picked from commit 024178c57419f915d26414e1b91ea0019c3650db) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[HOTFIX] Wait for EOF only for the PySpark shell	Andrew Or	2014-08-27	2	-11/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In `SparkSubmitDriverBootstrapper`, we wait for the parent process to send us an `EOF` before finishing the application. This is applicable for the PySpark shell because we terminate the application the same way. However if we run a python application, for instance, the JVM actually never exits unless it receives a manual EOF from the user. This is causing a few tests to timeout. We only need to do this for the PySpark shell because Spark submit runs as a python subprocess only in this case. Thus, the normal Spark shell doesn't need to go through this case even though it is also a REPL. Thanks davies for reporting this. Author: Andrew Or <andrewor14@gmail.com> Closes #2170 from andrewor14/bootstrap-hotfix and squashes the following commits: 42963f5 [Andrew Or] Do not wait for EOF unless this is the pyspark shell (cherry picked from commit dafe343499bbc688e266106e4bb897f9e619834e) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-08-27	24	-25/+25
\|
*	[maven-release-plugin] prepare release v1.1.0-rc1	Patrick Wendell	2014-08-27	24	-38/+33
\|
*	BUILD: Updating CHANGES.txt for Spark 1.1	Patrick Wendell	2014-08-27	2	-2/+14472
\|
*	Add line continuation for script to work w/ py2.7.5	Matthew Farrellee	2014-08-27	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Error was - $ SPARK_HOME=$PWD/dist ./dev/create-release/generate-changelist.py File "./dev/create-release/generate-changelist.py", line 128 if day < SPARK_REPO_CHANGE_DATE1 or ^ SyntaxError: invalid syntax Author: Matthew Farrellee <matt@redhat.com> Closes #2139 from mattf/master-fix-generate-changelist.py-0 and squashes the following commits: 6b3a900 [Matthew Farrellee] Add line continuation for script to work w/ py2.7.5 (cherry picked from commit 64d8ecbbe94c47236ff2d8c94d7401636ba6fca4) Signed-off-by: Patrick Wendell <pwendell@gmail.com>