spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge remote-tracking branch 'apache/master' into conf2	Matei Zaharia	2013-12-31	19	-107/+120
\|\ \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/rdd/CheckpointRDD.scala streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
\| *	Merge pull request #238 from ngbinh/upgradeNetty	Patrick Wendell	2013-12-31	6	-42/+58
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	upgrade Netty from 4.0.0.Beta2 to 4.0.13.Final the changes are listed at https://github.com/netty/netty/wiki/New-and-noteworthy
\| \| *	Fix failed unit tests	Binh Nguyen	2013-12-27	3	-13/+24
\| \| \| \| \| \| \| \| \| \| \| \|	Also clean up a bit.
\| \| *	Fix imports order	Binh Nguyen	2013-12-24	3	-5/+2
\| \| \|
\| \| *	Remove import * and fix some formatting	Binh Nguyen	2013-12-24	2	-7/+4
\| \| \|
\| \| *	upgrade Netty from 4.0.0.Beta2 to 4.0.13.Final	Binh Nguyen	2013-12-24	5	-29/+40
\| \| \|
\| * \|	Merge pull request #289 from tdas/filestream-fix	Patrick Wendell	2013-12-31	5	-45/+44
\| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Bug fixes for file input stream and checkpointing - Fixed bugs in the file input stream that led the stream to fail due to transient HDFS errors (listing files when a background thread it deleting fails caused errors, etc.) - Updated Spark's CheckpointRDD and Streaming's CheckpointWriter to use SparkContext.hadoopConfiguration, to allow checkpoints to be written to any HDFS compatible store requiring special configuration. - Changed the API of SparkContext.setCheckpointDir() - eliminated the unnecessary 'useExisting' parameter. Now SparkContext will always create a unique subdirectory within the user specified checkpoint directory. This is to ensure that previous checkpoint files are not accidentally overwritten. - Fixed bug where setting checkpoint directory as a relative local path caused the checkpointing to fail.
\| \| * \|	Fixed comments and long lines based on comments on PR 289.	Tathagata Das	2013-12-31	1	-1/+2
\| \| \| \|
\| \| * \|	Fixed Python API for sc.setCheckpointDir. Also other fixes based on ↵	Tathagata Das	2013-12-24	3	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reynold's comments on PR 289.
\| \| * \|	Merge branch 'apache-master' into filestream-fix	Tathagata Das	2013-12-24	30	-113/+395
\| \| \|\\|
\| \| * \|	Merge branch 'scheduler-update' into filestream-fix	Tathagata Das	2013-12-19	111	-632/+740
\| \| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/rdd/CheckpointRDD.scala streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala
\| \| * \| \|	Fixed multiple file stream and checkpointing bugs.	Tathagata Das	2013-12-11	5	-43/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Made file stream more robust to transient failures. - Changed Spark.setCheckpointDir API to not have the second 'useExisting' parameter. Spark will always create a unique directory for checkpointing underneath the directory provide to the funtion. - Fixed bug wrt local relative paths as checkpoint directory. - Made DStream and RDD checkpointing use SparkContext.hadoopConfiguration, so that more HDFS compatible filesystems are supported for checkpointing.
\| * \| \| \|	Merge pull request #308 from kayousterhout/stage_naming	Patrick Wendell	2013-12-30	7	-14/+18
\| \|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Changed naming of StageCompleted event to be consistent The rest of the SparkListener events are named with "SparkListener" as the prefix of the name; this commit renames the StageCompleted event to SparkListenerStageCompleted for consistency.
\| \| * \| \| \|	Updated code style according to Patrick's comments	Kay Ousterhout	2013-12-29	1	-4/+2
\| \| \| \| \| \|
\| \| * \| \| \|	Changed naming of StageCompleted event to be consistent	Kay Ousterhout	2013-12-27	7	-14/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The rest of the SparkListener events are named with "SparkListener" as the prefix of the name; this commit renames the StageCompleted event to SparkListenerStageCompleted for consistency.
\| * \| \| \| \|	Merge pull request #304 from kayousterhout/remove_unused	Patrick Wendell	2013-12-28	1	-6/+0
\| \|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Removed unused failed and causeOfFailure variables (in TaskSetManager)
\| \| * \| \| \| \|	Removed unused failed and causeOfFailure variables	Kay Ousterhout	2013-12-27	1	-6/+0
\| \| \| \| \| \| \|
* \| \| \| \| \| \|	Updated docs for SparkConf and handled review comments	Matei Zaharia	2013-12-30	9	-32/+56
\| \| \| \| \| \| \|
* \| \| \| \| \| \|	Properly show Spark properties on web UI, and change app name property	Matei Zaharia	2013-12-29	4	-9/+12
\| \| \| \| \| \| \|
* \| \| \| \| \| \|	Added tests for SparkConf and fixed a bug	Matei Zaharia	2013-12-29	3	-0/+117
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Typesafe Config caches system properties the first time it's invoked by default, ignoring later changes unless you do something special
* \| \| \| \| \| \|	Fix a change that was lost during merge	Matei Zaharia	2013-12-29	1	-1/+2
\| \| \| \| \| \| \|
* \| \| \| \| \| \|	Fix a few settings that were being read as system properties after merge	Matei Zaharia	2013-12-29	2	-9/+13
\| \| \| \| \| \| \|
* \| \| \| \| \| \|	Merge remote-tracking branch 'origin/master' into conf2	Matei Zaharia	2013-12-29	43	-1576/+1264
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/SparkContext.scala core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala core/src/main/scala/org/apache/spark/scheduler/local/LocalScheduler.scala core/src/main/scala/org/apache/spark/util/MetadataCleaner.scala core/src/test/scala/org/apache/spark/scheduler/TaskResultGetterSuite.scala core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala new-yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala streaming/src/test/scala/org/apache/spark/streaming/BasicOperationsSuite.scala streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala streaming/src/test/scala/org/apache/spark/streaming/WindowOperationsSuite.scala
\| * \| \| \| \| \|	Merge pull request #307 from kayousterhout/other_failure	Matei Zaharia	2013-12-27	2	-6/+0
\| \|\ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Removed unused OtherFailure TaskEndReason. The OtherFailure TaskEndReason was added by @mateiz 3 years ago in this commit: https://github.com/apache/incubator-spark/commit/24a1e7f8380bfd8d4fbdda688482a451bd6ea215 Unless I am missing something, it doesn't seem to have been used then, and is not used now, so seems safe for deletion.
\| \| * \| \| \| \| \|	Removed unused OtherFailure TaskEndReason.	Kay Ousterhout	2013-12-27	2	-6/+0
\| \| \| \|/ / / / \| \| \|/\| \| \| \|
\| * / \| \| \| \|	Remove unused hasPendingTasks methods	Kay Ousterhout	2013-12-27	4	-16/+0
\| \|/ / / / /
\| * \| \| \| \|	Style fixes as per Reynold's review	Kay Ousterhout	2013-12-27	1	-6/+6
\| \| \| \| \| \|
\| * \| \| \| \|	Fixed >100char lines in DAGScheduler.scala	Kay Ousterhout	2013-12-27	1	-15/+27
\| \|/ / / /
\| * \| \| \|	Merge pull request #298 from aarondav/minor	Reynold Xin	2013-12-26	1	-3/+3
\| \|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Minor: Decrease margin of left side of Log page Before ![before](https://f.cloud.github.com/assets/1400247/1812647/1a4be53e-6e87-11e3-9d5b-f851274be0e9.png) After ![after](https://f.cloud.github.com/assets/1400247/1812648/1ca1ea2c-6e87-11e3-946c-31be9258f450.png) It's a start anyway...
\| \| * \| \| \|	Decrease margin of left side of log page	Aaron Davidson	2013-12-26	1	-3/+3
\| \| \| \| \| \|
\| * \| \| \| \|	Avoid a lump of coal (NPE) in JobProgressListener's stocking.	Mark Hamstra	2013-12-25	1	-6/+3
\| \|/ / / /
\| * \| \| \|	Merge pull request #127 from kayousterhout/consolidate_schedulers	Patrick Wendell	2013-12-24	25	-1515/+940
\| \|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Deduplicate Local and Cluster schedulers. The code in LocalScheduler/LocalTaskSetManager was nearly identical to the code in ClusterScheduler/ClusterTaskSetManager. The redundancy made making updating the schedulers unnecessarily painful and error- prone. This commit combines the two into a single TaskScheduler/ TaskSetManager. Unfortunately the diff makes this change look much more invasive than it is -- TaskScheduler.scala is only superficially changed (names updated, overrides removed) from the old ClusterScheduler.scala, and the same with TaskSetManager.scala. Thanks @rxin for suggesting this change!
\| \| * \| \| \|	Responded to Reynold's style comments	Kay Ousterhout	2013-12-24	3	-6/+7
\| \| \| \| \| \|
\| \| * \| \| \|	Correctly merged in maxTaskFailures fix	Kay Ousterhout	2013-12-22	4	-5/+5
\| \| \| \| \| \|
\| \| * \| \| \|	Fix build error in test	Kay Ousterhout	2013-12-21	1	-1/+1
\| \| \| \| \| \|
\| \| * \| \| \|	Renamed ClusterScheduler to TaskSchedulerImpl	Kay Ousterhout	2013-12-20	14	-39/+39
\| \| \| \| \| \|
\| \| * \| \| \|	Merge remote branch 'upstream/master' into consolidate_schedulers	Kay Ousterhout	2013-12-20	121	-707/+951
\| \| \|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala
\| \| * \ \ \ \	Merge master into 127	Aaron Davidson	2013-12-08	61	-664/+2031
\| \| \|\ \ \ \ \
\| \| * \| \| \| \| \|	Fixed error message in ClusterScheduler to be consistent with the old ↵	Kay Ousterhout	2013-11-15	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	LocalScheduler
\| \| * \| \| \| \| \|	Merge remote-tracking branch 'upstream/master' into consolidate_schedulers	Kay Ousterhout	2013-11-15	7	-80/+46
\| \| \|\ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala
\| \| * \| \| \| \| \| \|	Don't retry tasks if result wasn't serializable	Kay Ousterhout	2013-11-14	1	-1/+11
\| \| \| \| \| \| \| \| \|
\| \| * \| \| \| \| \| \|	Fix bug where scheduler could hang after task failure.	Kay Ousterhout	2013-11-14	1	-10/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a task fails, we need to call reviveOffers() so that the task can be rescheduled on a different machine. In the current code, the state in ClusterTaskSetManager indicating which tasks are pending may be updated after revive offers is called (there's a race condition here), so when revive offers is called, the task set manager does not yet realize that there are failed tasks that need to be relaunched.
\| \| * \| \| \| \| \| \|	Changed local backend to use Akka actor	Kay Ousterhout	2013-11-14	1	-23/+57
\| \| \| \| \| \| \| \| \|
\| \| * \| \| \| \| \| \|	Fixed naming issues and added back ability to specify max task failures.	Kay Ousterhout	2013-11-13	12	-124/+174
\| \| \| \| \| \| \| \| \|
\| \| * \| \| \| \| \| \|	Merge remote-tracking branch 'upstream/master' into consolidate_schedulers	Kay Ousterhout	2013-11-13	37	-507/+2019
\| \| \|\ \ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/scheduler/ClusterScheduler.scala
\| \| * \| \| \| \| \| \| \|	Extracted TaskScheduler interface.	Kay Ousterhout	2013-11-13	14	-73/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Also changed the default maximum number of task failures to be 0 when running in local mode.
\| \| * \| \| \| \| \| \| \|	Cleaned up imports and fixed test bug	Kay Ousterhout	2013-10-31	3	-7/+6
\| \| \| \| \| \| \| \| \| \|
\| \| * \| \| \| \| \| \| \|	Fixed most issues with unit tests	Kay Ousterhout	2013-10-30	5	-106/+103
\| \| \| \| \| \| \| \| \| \|
\| \| * \| \| \| \| \| \| \|	Deduplicate Local and Cluster schedulers.	Kay Ousterhout	2013-10-30	21	-1924/+1268
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The code in LocalScheduler/LocalTaskSetManager was nearly identical to the code in ClusterScheduler/ClusterTaskSetManager. The redundancy made making updating the schedulers unnecessarily painful and error- prone. This commit combines the two into a single TaskScheduler/ TaskSetManager.
\| * \| \| \| \| \| \| \| \|	Merge pull request #279 from aarondav/shuffle-cleanup0	Patrick Wendell	2013-12-24	3	-7/+35
\| \|\ \ \ \ \ \ \ \ \ \| \| \|_\|_\|_\|_\|_\|_\|_\|/ \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Clean up shuffle files once their metadata is gone Previously, we would only clean the in-memory metadata for consolidated shuffle files. Additionally, fixes a bug where the Metadata Cleaner was ignoring type-specific TTLs.