spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[maven-release-plugin] prepare release v1.1.1-rc2v1.1.1	Andrew Or	2014-11-19	24	-25/+25
\|
*	Update CHANGES.txt for 1.1.1-rc2	Andrew Or	2014-11-19	1	-0/+65
\|
*	[SPARK-4480] Avoid many small spills in external data structures (1.1)	Andrew Or	2014-11-19	4	-12/+37
\| \| \| \| \| \| \| \| \| \| \|	This is the branch-1.1 version of #3353. This requires a separate PR because the code in master has been refactored a little to eliminate duplicate code. I have tested this on a standalone cluster. The goal is to merge this into 1.1.1. Author: Andrew Or <andrew@databricks.com> Closes #3354 from andrewor14/avoid-small-spills-1.1 and squashes the following commits: f2e552c [Andrew Or] Fix tests 7012595 [Andrew Or] Avoid many small spills
*	[SPARK-4380] Log more precise number of bytes spilled (1.1)	Andrew Or	2014-11-18	2	-4/+6
\| \| \| \| \| \| \| \| \| \|	This is the branch-1.1 version of #3243. Author: Andrew Or <andrew@databricks.com> Closes #3355 from andrewor14/spill-log-bytes-1.1 and squashes the following commits: 36ec152 [Andrew Or] Log more precise representation of bytes in spilling code
*	[SPARK-4468][SQL] Backports #3334 to branch-1.1	Cheng Lian	2014-11-18	2	-45/+75
\| \| \| \| \| \| \| \| \| \| \| \|	<!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3338) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #3338 from liancheng/spark-3334-for-1.1 and squashes the following commits: bd17512 [Cheng Lian] Backports #3334 to branch-1.1
*	[SPARK-4433] fix a racing condition in zipWithIndex	Xiangrui Meng	2014-11-18	2	-14/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Spark hangs with the following code: ~~~ sc.parallelize(1 to 10).zipWithIndex.repartition(10).count() ~~~ This is because ZippedWithIndexRDD triggers a job in getPartitions and it causes a deadlock in DAGScheduler.getPreferredLocs (synced). The fix is to compute `startIndices` during construction. This should be applied to branch-1.0, branch-1.1, and branch-1.2. pwendell Author: Xiangrui Meng <meng@databricks.com> Closes #3291 from mengxr/SPARK-4433 and squashes the following commits: c284d9f [Xiangrui Meng] fix a racing condition in zipWithIndex (cherry picked from commit bb46046154a438df4db30a0e1fd557bd3399ee7b) Signed-off-by: Xiangrui Meng <meng@databricks.com>
*	[SPARK-4393] Fix memory leak in ConnectionManager ACK timeout TimerTasks; ↵	Kousuke Saruta	2014-11-18	1	-13/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	use HashedWheelTimer (For branch-1.1) This patch is intended to fix a subtle memory leak in ConnectionManager's ACK timeout TimerTasks: in the old code, each TimerTask held a reference to the message being sent and a cancelled TimerTask won't necessarily be garbage-collected until it's scheduled to run, so this caused huge buildups of messages that weren't garbage collected until their timeouts expired, leading to OOMs. This patch addresses this problem by capturing only the message ID in the TimerTask instead of the whole message, and by keeping a WeakReference to the promise in the TimerTask. I've also modified this code to use Netty's HashedWheelTimer, whose performance characteristics should be better for this use-case. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #3321 from sarutak/connection-manager-timeout-bugfix and squashes the following commits: 786af91 [Kousuke Saruta] Fixed memory leak issue of ConnectionManager
*	[SPARK-4467] Partial fix for fetch failure in sort-based shuffle (1.1)	Andrew Or	2014-11-17	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	This is the 1.1 version of #3302. There has been some refactoring in master so we can't cherry-pick that PR. Author: Andrew Or <andrew@databricks.com> Closes #3330 from andrewor14/sort-fetch-fail and squashes the following commits: 486fc49 [Andrew Or] Reset `elementsRead`
*	Revert "[maven-release-plugin] prepare release v1.1.1-rc1"	Andrew Or	2014-11-17	24	-25/+25
\| \| \| \|	This reverts commit 72a4fdbe82203b962fe776d0edaed7f56898cb02.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Andrew Or	2014-11-17	24	-25/+25
\| \| \| \|	This reverts commit 685bdd2b7e584c84e7d39e40de2d5f30c5388cb5.
*	Revert "[SPARK-4075] [Deploy] Jar url validation is not enough for Jar file"	Andrew Or	2014-11-17	2	-16/+1
\| \| \| \|	This reverts commit 098f83c7ccd7dad9f9228596da69fe5f55711a52.
*	[branch-1.1][SPARK-4355] OnlineSummarizer doesn't merge mean correctly	Xiangrui Meng	2014-11-13	1	-11/+9
\| \| \| \| \| \| \| \| \| \| \|	andrewor14 This backports the bug fix in #3220 . It would be good if we can get it in 1.1.1. But this is minor. Author: Xiangrui Meng <meng@databricks.com> Closes #3251 from mengxr/SPARK-4355-1.1 and squashes the following commits: 33886b6 [Xiangrui Meng] Merge remote-tracking branch 'apache/branch-1.1' into SPARK-4355-1.1 91fe1a3 [Xiangrui Meng] fix OnlineSummarizer.merge when other.mean is zero
*	[maven-release-plugin] prepare for next development iteration	Andrew Or	2014-11-13	24	-25/+25
\|
*	[maven-release-plugin] prepare release v1.1.1-rc1	Andrew Or	2014-11-13	24	-25/+25
\|
*	Revert "[maven-release-plugin] prepare release v1.1.1-rc1"	Andrew Or	2014-11-12	24	-25/+25
\| \| \| \|	This reverts commit 3f9e073ff0bb18b6079fda419d4e9dbf594545b0.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Andrew Or	2014-11-12	24	-25/+25
\| \| \| \|	This reverts commit 6de888129fcfe6e592458a4217fc66140747b54f.
*	[Release] Correct make-distribution.sh log path	Andrew Or	2014-11-12	1	-1/+1
\|
*	[Release] Bring audit scripts up-to-date	Andrew Or	2014-11-13	4	-286/+75
\| \| \| \| \| \| \| \|	This involves a few main changes: - Log all output message to the log file. Previously the log file was not useful because it did not indicate progress. - Remove hive-site.xml in sbt_hive_app to avoid interference - Add the appropriate repositories for new dependencies
*	[maven-release-plugin] prepare for next development iteration	Andrew Or	2014-11-12	24	-25/+25
\|
*	[maven-release-plugin] prepare release v1.1.1-rc1	Andrew Or	2014-11-12	24	-25/+25
\|
*	Revert "[maven-release-plugin] prepare release v1.1.1-rc1"	Andrew Or	2014-11-12	24	-25/+25
\| \| \| \|	This reverts commit 7029301778895427216f2e0710c6e72a523c0897.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Andrew Or	2014-11-12	24	-25/+25
\| \| \| \|	This reverts commit db22a9e2cb51eae2f8a79648ce3c6bf4fecdd641.
*	[maven-release-plugin] prepare for next development iteration	Andrew Or	2014-11-12	24	-25/+25
\|
*	[maven-release-plugin] prepare release v1.1.1-rc1	Andrew Or	2014-11-12	24	-25/+25
\|
*	[Release] Log build output for each distribution	Andrew Or	2014-11-12	1	-1/+2
\|
*	Revert "[maven-release-plugin] prepare release v1.1.1-rc1"	Andrew Or	2014-11-12	24	-25/+25
\| \| \| \|	This reverts commit 837deabebf0714e3f3aca135d77169cc825824f3.
*	[maven-release-plugin] prepare release v1.1.1-rc1	Andrew Or	2014-11-12	24	-25/+25
\|
*	Revert "[maven-release-plugin] prepare release v1.1.1-rc1"	Andrew Or	2014-11-12	24	-25/+25
\| \| \| \| \| \| \|	This reverts commit f3e62ffa4ccea62911207b918ef1c23c1f50467f. Conflicts: pom.xml
*	Revert "[maven-release-plugin] prepare for next development iteration"	Andrew Or	2014-11-12	24	-25/+25
\| \| \| \|	This reverts commit 5c0032a471d858fb010b1737ea14375f1af3ed88.
*	Revert "SPARK-3039: Allow spark to be built using avro-mapred for hadoop2"	Andrew Or	2014-11-12	2	-14/+0
\| \| \| \| \| \| \|	This reverts commit 78887f94a0ae9cdcfb851910ab9c7d51a1ef2acb. Conflicts: pom.xml
*	[maven-release-plugin] prepare for next development iteration	Andrew Or	2014-11-11	24	-25/+25
\|
*	[maven-release-plugin] prepare release v1.1.1-rc1	Andrew Or	2014-11-11	24	-26/+26
\|
*	Update CHANGES.txt	Andrew Or	2014-11-11	2	-3/+684
\|
*	[SPARK-4295][External]Fix exception in SparkSinkSuite	maji2014	2014-11-11	2	-0/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Handle exception in SparkSinkSuite, please refer to [SPARK-4295] Author: maji2014 <maji3@asiainfo.com> Closes #3177 from maji2014/spark-4295 and squashes the following commits: 312620a [maji2014] change a new statement for spark-4295 24c3d21 [maji2014] add log4j.properties for SparkSinkSuite and spark-4295 c807bf6 [maji2014] Fix exception in SparkSinkSuite (cherry picked from commit f8811a5695af2dfe156f07431288db7b8cd97159) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
*	[branch-1.1][SPARK-3990] add a note on ALS usage	Xiangrui Meng	2014-11-10	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \|	Because we switched back to Kryo in #3187 , we need to leave a note about the workaround. Author: Xiangrui Meng <meng@databricks.com> Closes #3190 from mengxr/SPARK-3990-1.1 and squashes the following commits: d4818f3 [Xiangrui Meng] fix python style 53725b0 [Xiangrui Meng] add a note about SPARK-3990 56ad70e [Xiangrui Meng] add a note about SPARK-3990
*	[BRANCH-1.1][SPARK-2652] change the default spark.serializer in pyspark back ↵	Xiangrui Meng	2014-11-10	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	to Kryo This reverts #2916 . We shouldn't change the default settings in a minor release. JoshRosen davies Author: Xiangrui Meng <meng@databricks.com> Closes #3187 from mengxr/SPARK-2652-1.1 and squashes the following commits: 372166b [Xiangrui Meng] change the default spark.serializer in pyspark back to Kryo
*	[SPARK-4330][Doc] Link to proper URL for YARN overview	Kousuke Saruta	2014-11-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In running-on-yarn.md, a link to YARN overview is here. But the URL is to YARN alpha's. It should be stable's. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #3196 from sarutak/SPARK-4330 and squashes the following commits: 30baa21 [Kousuke Saruta] Fixed running-on-yarn.md to point proper URL for YARN (cherry picked from commit 3c07b8f08240bafcdff5d174989fb433f4bc80b6) Signed-off-by: Matei Zaharia <matei@databricks.com>
*	[SQL] Backport backtick and smallint JDBC fixes to 1.1	ravipesala	2014-11-10	3	-5/+17
\| \| \| \| \| \| \| \| \| \| \| \| \|	Author: Michael Armbrust <michael@databricks.com> Author: ravipesala <ravindra.pesala@huawei.com> Author: scwf <wangfei1@huawei.com> Closes #3199 from marmbrus/backport1.1 and squashes the following commits: 019a0dd [Michael Armbrust] Drop incorrectly ported test cases 4c9f3e6 [ravipesala] [SPARK-3708][SQL] Backticks aren't handled correctly is aliases 064750d [scwf] [SPARK-3704][SQL] Fix ColumnValue type for Short values in thrift server f4e17cd [ravipesala] [SPARK-3834][SQL] Backticks not correctly handled in subquery aliases
*	Update versions for 1.1.1 release	Andrew Or	2014-11-10	7	-9/+9
\|
*	[SPARK-3495][SPARK-3496] Backporting block replication fixes made in master ↵	Tathagata Das	2014-11-10	8	-44/+535
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	to branch 1.1 The original PR was #2366 This backport was non-trivial because Spark 1.1 uses ConnectionManager instead of NioBlockTransferService, which required slight modification to unit tests. Other than that the code is exactly same as in the original PR. Please refer to discussion in the original PR if you have any thoughts. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #3191 from tdas/replication-fix-branch-1.1-backport and squashes the following commits: 593214a [Tathagata Das] Merge remote-tracking branch 'apache-github/branch-1.1' into branch-1.1 2ed927f [Tathagata Das] Fixed error in unit test. de4ff73 [Tathagata Das] [SPARK-3495] Block replication fails continuously when the replication target node is dead AND [SPARK-3496] Block replication by mistake chooses driver as target
*	[SPARK-3954][Streaming] Optimization to FileInputDStream	surq	2014-11-10	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	about convert files to RDDS there are 3 loops with files sequence in spark source. loops files sequence: 1.files.map(...) 2.files.zip(fileRDDs) 3.files-size.foreach It's will very time consuming when lots of files.So I do the following correction: 3 loops with files sequence => only one loop Author: surq <surq@asiainfo.com> Closes #2811 from surq/SPARK-3954 and squashes the following commits: 321bbe8 [surq] updated the code style.The style from [for...yield]to [files.map(file=>{})] 88a2c20 [surq] Merge branch 'master' of https://github.com/apache/spark into SPARK-3954 178066f [surq] modify code's style. [Exceeds 100 columns] 626ef97 [surq] remove redundant import(ArrayBuffer) 739341f [surq] promote the speed of convert files to RDDS (cherry picked from commit ce6ed2abd14de26b9ceaa415e9a42fbb1338f5fa) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
*	[SPARK-3971][SQL] Backport #2843 to branch-1.1	Cheng Lian	2014-11-10	13	-292/+307
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This PR backports #2843 to branch-1.1. The key difference is that this one doesn't support Hive 0.13.1 and thus always returns `0.12.0` when `spark.sql.hive.version` is queried. 6 other commits on which #2843 depends were also backported, they are: - #2887 for `SessionState` lifecycle control - #2675, #2823 & #3060 for major test suite refactoring and bug fixes - #2164, for Parquet test suites updates - #2493, for reading `spark.sql.*` configurations Author: Cheng Lian <lian@databricks.com> Author: Cheng Lian <lian.cs.zju@gmail.com> Author: Michael Armbrust <michael@databricks.com> Closes #3113 from liancheng/get-info-for-1.1 and squashes the following commits: d354161 [Cheng Lian] Provides Spark and Hive version in HiveThriftServer2 for branch-1.1 0c2a244 [Michael Armbrust] [SPARK-3646][SQL] Copy SQL configuration from SparkConf when a SQLContext is created. 3202a36 [Michael Armbrust] [SQL] Decrease partitions when testing 7f395b7 [Cheng Lian] [SQL] Fixes race condition in CliSuite 0dd28ec [Cheng Lian] [SQL] Fixes the race condition that may cause test failure 5928b39 [Cheng Lian] [SPARK-3809][SQL] Fixes test suites in hive-thriftserver faeca62 [Cheng Lian] [SPARK-4037][SQL] Removes the SessionState instance created in HiveThriftServer2
*	[SPARK-4308][SQL] Follow up of #3175 for branch 1.1	Cheng Lian	2014-11-10	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	PR #3175 is for master branch only and can't be backported to branch 1.1 directly because Hive 0.13.1 support. Author: Cheng Lian <lian@databricks.com> Closes #3176 from liancheng/fix-op-state-for-1.1 and squashes the following commits: 8791d87 [Cheng Lian] This is a follow up of #3175 for branch 1.1
*	[SPARK-2548][HOTFIX][Streaming] Removed use of o.a.s.streaming.Durations in ↵	Tathagata Das	2014-11-10	2	-4/+4
\| \| \| \| \| \| \| \| \| \|	branch 1.1 Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #3188 from tdas/branch-1.1 and squashes the following commits: f1996d3 [Tathagata Das] [SPARK-2548][HOTFIX] Removed use of o.a.s.streaming.Durations
*	Update RecoverableNetworkWordCount.scala	comcmipi	2014-11-10	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Trying this example, I missed the moment when the checkpoint was iniciated Author: comcmipi <pitonak@fns.uniba.sk> Closes #2735 from comcmipi/patch-1 and squashes the following commits: b6d8001 [comcmipi] Update RecoverableNetworkWordCount.scala 96fe274 [comcmipi] Update RecoverableNetworkWordCount.scala (cherry picked from commit 0340c56a921d4eb4bc9058e25e926721f8df594c) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
*	SPARK-2548 [STREAMING] JavaRecoverableWordCount is missing	Sean Owen	2014-11-10	3	-17/+159
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Here's my attempt to re-port `RecoverableNetworkWordCount` to Java, following the example of its Scala and Java siblings. I fixed a few minor doc/formatting issues along the way I believe. Author: Sean Owen <sowen@cloudera.com> Closes #2564 from srowen/SPARK-2548 and squashes the following commits: 0d0bf29 [Sean Owen] Update checkpoint call as in https://github.com/apache/spark/pull/2735 35f23e3 [Sean Owen] Remove old comment about running in standalone mode 179b3c2 [Sean Owen] Re-port RecoverableNetworkWordCount to Java example, and touch up doc / formatting in related examples (cherry picked from commit 3a02d416cd82a7a942fd6ff4a0e05ff070eb218a) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
*	[SPARK-4169] [Core] Accommodate non-English Locales in unit tests	Niklas Wilcke	2014-11-10	2	-12/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For me the core tests failed because there are two locale dependent parts in the code. Look at the Jira ticket for details. Why is it necessary to check the exception message in isBindCollision in https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1686 ? Author: Niklas Wilcke <1wilcke@informatik.uni-hamburg.de> Closes #3036 from numbnut/core-test-fix and squashes the following commits: 1fb0d04 [Niklas Wilcke] Fixing locale dependend code and tests (cherry picked from commit ed8bf1eac548577c4bbad7ce3f7f301a2f52ef17) Signed-off-by: Andrew Or <andrew@databricks.com>
*	[SPARK-4301] StreamingContext should not allow start() to be called after ↵	Josh Rosen	2014-11-08	2	-19/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	calling stop() In Spark 1.0.0+, calling `stop()` on a StreamingContext that has not been started is a no-op which has no side-effects. This allows users to call `stop()` on a fresh StreamingContext followed by `start()`. I believe that this almost always indicates an error and is not behavior that we should support. Since we don't allow `start() stop() start()` then I don't think it makes sense to allow `stop() start()`. The current behavior can lead to resource leaks when StreamingContext constructs its own SparkContext: if I call `stop(stopSparkContext=True)`, then I expect StreamingContext's underlying SparkContext to be stopped irrespective of whether the StreamingContext has been started. This is useful when writing unit test fixtures. Prior discussions: - https://github.com/apache/spark/pull/3053#discussion-diff-19710333R490 - https://github.com/apache/spark/pull/3121#issuecomment-61927353 Author: Josh Rosen <joshrosen@databricks.com> Closes #3160 from JoshRosen/SPARK-4301 and squashes the following commits: dbcc929 [Josh Rosen] Address more review comments bdbe5da [Josh Rosen] Stop SparkContext after stopping scheduler, not before. 03e9c40 [Josh Rosen] Always stop SparkContext, even if stop(false) has already been called. 832a7f4 [Josh Rosen] Address review comment 5142517 [Josh Rosen] Add tests; improve Scaladoc. 813e471 [Josh Rosen] Revert workaround added in https://github.com/apache/spark/pull/3053/files#diff-e144dbee130ed84f9465853ddce65f8eR49 5558e70 [Josh Rosen] StreamingContext.stop() should stop SparkContext even if StreamingContext has not been started yet. (cherry picked from commit 7b41b17f3296eea3282efbdceb6b28baf128287d) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
*	[SPARK-4304] [PySpark] Fix sort on empty RDD	Davies Liu	2014-11-07	2	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This PR fix sortBy()/sortByKey() on empty RDD. This should be back ported into 1.1/1.2 Author: Davies Liu <davies@databricks.com> Closes #3162 from davies/fix_sort and squashes the following commits: 84f64b7 [Davies Liu] add tests 52995b5 [Davies Liu] fix sortByKey() on empty RDD (cherry picked from commit 7779109796c90d789464ab0be35917f963bbe867) Signed-off-by: Josh Rosen <joshrosen@databricks.com> Conflicts: python/pyspark/tests.py
*	Update JavaCustomReceiver.java	xiao321	2014-11-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	数组下标越界 Author: xiao321 <1042460381@qq.com> Closes #3153 from xiao321/patch-1 and squashes the following commits: 0ed17b5 [xiao321] Update JavaCustomReceiver.java (cherry picked from commit 7c9ec529a3483fab48f728481dd1d3663369e50a) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>