spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge pull request #127 from kayousterhout/consolidate_schedulers	Patrick Wendell	2013-12-24	25	-1515/+940
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Deduplicate Local and Cluster schedulers. The code in LocalScheduler/LocalTaskSetManager was nearly identical to the code in ClusterScheduler/ClusterTaskSetManager. The redundancy made making updating the schedulers unnecessarily painful and error- prone. This commit combines the two into a single TaskScheduler/ TaskSetManager. Unfortunately the diff makes this change look much more invasive than it is -- TaskScheduler.scala is only superficially changed (names updated, overrides removed) from the old ClusterScheduler.scala, and the same with TaskSetManager.scala. Thanks @rxin for suggesting this change!
\| *	Responded to Reynold's style comments	Kay Ousterhout	2013-12-24	3	-6/+7
\| \|
\| *	Correctly merged in maxTaskFailures fix	Kay Ousterhout	2013-12-22	4	-5/+5
\| \|
\| *	Fix build error in test	Kay Ousterhout	2013-12-21	1	-1/+1
\| \|
\| *	Renamed ClusterScheduler to TaskSchedulerImpl	Kay Ousterhout	2013-12-20	14	-39/+39
\| \|
\| *	Merge remote branch 'upstream/master' into consolidate_schedulers	Kay Ousterhout	2013-12-20	121	-707/+951
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala
\| * \	Merge master into 127	Aaron Davidson	2013-12-08	61	-664/+2031
\| \|\ \
\| * \| \|	Fixed error message in ClusterScheduler to be consistent with the old ↵	Kay Ousterhout	2013-11-15	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	LocalScheduler
\| * \| \|	Merge remote-tracking branch 'upstream/master' into consolidate_schedulers	Kay Ousterhout	2013-11-15	7	-80/+46
\| \|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala
\| * \| \| \|	Don't retry tasks if result wasn't serializable	Kay Ousterhout	2013-11-14	1	-1/+11
\| \| \| \| \|
\| * \| \| \|	Fix bug where scheduler could hang after task failure.	Kay Ousterhout	2013-11-14	1	-10/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a task fails, we need to call reviveOffers() so that the task can be rescheduled on a different machine. In the current code, the state in ClusterTaskSetManager indicating which tasks are pending may be updated after revive offers is called (there's a race condition here), so when revive offers is called, the task set manager does not yet realize that there are failed tasks that need to be relaunched.
\| * \| \| \|	Changed local backend to use Akka actor	Kay Ousterhout	2013-11-14	1	-23/+57
\| \| \| \| \|
\| * \| \| \|	Fixed naming issues and added back ability to specify max task failures.	Kay Ousterhout	2013-11-13	12	-124/+174
\| \| \| \| \|
\| * \| \| \|	Merge remote-tracking branch 'upstream/master' into consolidate_schedulers	Kay Ousterhout	2013-11-13	37	-507/+2019
\| \|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/scheduler/ClusterScheduler.scala
\| * \| \| \| \|	Extracted TaskScheduler interface.	Kay Ousterhout	2013-11-13	14	-73/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Also changed the default maximum number of task failures to be 0 when running in local mode.
\| * \| \| \| \|	Cleaned up imports and fixed test bug	Kay Ousterhout	2013-10-31	3	-7/+6
\| \| \| \| \| \|
\| * \| \| \| \|	Fixed most issues with unit tests	Kay Ousterhout	2013-10-30	5	-106/+103
\| \| \| \| \| \|
\| * \| \| \| \|	Deduplicate Local and Cluster schedulers.	Kay Ousterhout	2013-10-30	21	-1924/+1268
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The code in LocalScheduler/LocalTaskSetManager was nearly identical to the code in ClusterScheduler/ClusterTaskSetManager. The redundancy made making updating the schedulers unnecessarily painful and error- prone. This commit combines the two into a single TaskScheduler/ TaskSetManager.
* \| \| \| \| \|	Merge pull request #279 from aarondav/shuffle-cleanup0	Patrick Wendell	2013-12-24	3	-7/+35
\|\ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Clean up shuffle files once their metadata is gone Previously, we would only clean the in-memory metadata for consolidated shuffle files. Additionally, fixes a bug where the Metadata Cleaner was ignoring type-specific TTLs.
\| * \| \| \| \| \|	Clean up shuffle files once their metadata is gone	Aaron Davidson	2013-12-19	3	-7/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, we would only clean the in-memory metadata for consolidated shuffle files. Additionally, fixes a bug where the Metadata Cleaner was ignoring type- specific TTLs.
* \| \| \| \| \| \|	Merge pull request #277 from tdas/scheduler-update	Matei Zaharia	2013-12-24	2	-6/+2
\|\ \ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Refactored the streaming scheduler and added StreamingListener interface - Refactored the streaming scheduler for cleaner code. Specifically, the JobManager was renamed to JobScheduler, as it does the actual scheduling of Spark jobs to the SparkContext. The earlier Scheduler was renamed to JobGenerator, as it actually generates the jobs from the DStreams. The JobScheduler starts the JobGenerator. Also, moved all the scheduler related code from spark.streaming to spark.streaming.scheduler package. - Implemented the StreamingListener interface, similar to SparkListener. The streaming version of StatusReportListener prints the batch processing time statistics (for now). Added StreamingListernerSuite to test it. - Refactored streaming TestSuiteBase for deduping code in the other streaming testsuites.
\| * \| \| \| \| \| \|	Minor changes.	Tathagata Das	2013-12-18	1	-1/+0
\| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \|	Merge branch 'apache-master' into scheduler-update	Tathagata Das	2013-12-18	109	-627/+738
\| \|\ \ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/dstream/ForEachDStream.scala
\| * \| \| \| \| \| \| \|	Added StatsReportListener to generate processing time statistics across ↵	Tathagata Das	2013-12-18	1	-4/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	multiple batches.
\| * \| \| \| \| \| \| \|	Refactored streaming scheduler and added listener interface.	Tathagata Das	2013-12-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Refactored Scheduler + JobManager to JobGenerator + JobScheduler and added JobSet for cleaner code. Moved scheduler related code to streaming.scheduler package. - Added StreamingListener trait (similar to SparkListener) to enable gathering to streaming stats like processing times and delays. StreamingContext.addListener() to added listeners. - Deduped some code in streaming tests by modifying TestSuiteBase, and added StreamingListenerSuite.
* \| \| \| \| \| \| \| \|	Merge pull request #244 from leftnoteasy/master	Reynold Xin	2013-12-23	7	-5/+257
\|\ \ \ \ \ \ \ \ \ \| \|_\|_\|_\|_\|_\|_\|_\|/ \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added SPARK-968 implementation for review Added SPARK-968 implementation for review
\| * \| \| \| \| \| \| \|	SPARK-968, added executor address showing in aggregated metrics by executors ↵	wangda.tan	2013-12-23	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	table
\| * \| \| \| \| \| \| \|	added changes according to comments from rxin	wangda.tan	2013-12-22	7	-44/+24
\| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \|	spark-968, changes for avoid a NPE	wangda.tan	2013-12-17	2	-25/+29
\| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \|	spark-898, changes according to review comments	wangda.tan	2013-12-17	7	-76/+90
\| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \|	Merge branch 'master' of git://github.com/apache/incubator-spark	wangda.tan	2013-12-16	112	-615/+834
\| \|\ \ \ \ \ \ \ \
\| * \| \| \| \| \| \| \| \|	SPARK-968, added sc finalize code to avoid akka rebinding to the same port	wangda.tan	2013-12-09	1	-0/+7
\| \| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \|	Merge branch 'master' of https://github.com/leftnoteasy/incubator-spark-1	wangda.tan	2013-12-09	21	-260/+651
\| \|\ \ \ \ \ \ \ \ \ \| \| \| \|_\|_\|_\|_\|_\|_\|/ \| \| \|/\| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \|	SPARK-968, In stage UI, add an overview section that shows task stats ↵	wangda.tan	2013-12-09	5	-0/+234
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	grouped by executor id
* \| \| \| \| \| \| \| \| \|	Merge pull request #280 from aarondav/minor	Patrick Wendell	2013-12-20	3	-17/+8
\|\ \ \ \ \ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Minor cleanup for standalone scheduler See commit messages
\| * \| \| \| \| \| \| \| \| \|	Fix compiler warning in SparkZooKeeperSession	Aaron Davidson	2013-12-19	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \| \|	Remove firstApp from the standalone scheduler Master	Aaron Davidson	2013-12-19	1	-10/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As a lonely child with no one to care for it... we had to put it down.
\| * \| \| \| \| \| \| \| \| \|	Extraordinarily minor code/comment cleanup	Aaron Davidson	2013-12-19	2	-7/+7
\| \| \|_\|_\|_\|_\|/ / / / \| \|/\| \| \| \| \| \| \| \|
* \| \| \| \| \| \| \| \| \|	Merge pull request #272 from tmyklebu/master	Patrick Wendell	2013-12-19	5	-16/+36
\|\ \ \ \ \ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Track and report task result serialisation time. - DirectTaskResult now has a ByteBuffer valueBytes instead of a T value. - DirectTaskResult now has a member function T value() that deserialises valueBytes. - Executor serialises value into a ByteBuffer and passes it to DTR's ctor. - Executor tracks the time taken to do so and puts it in a new field in TaskMetrics. - StagePage now reports serialisation time from TaskMetrics along with the other things it reported.
\| * \| \| \| \| \| \| \| \| \|	Add a serialisation time column to the StagePage.	Tor Myklebust	2013-12-18	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \| \|	objectSer -> valueSer in a test.	Tor Myklebust	2013-12-17	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \| \|	Missed a spot; had an objectSer here too.	Tor Myklebust	2013-12-17	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \| \|	Merge branch 'master' of git://github.com/apache/incubator-spark	Tor Myklebust	2013-12-16	3	-5/+5
\| \|\ \ \ \ \ \ \ \ \ \
\| * \| \| \| \| \| \| \| \| \| \|	Incorporate pwendell's code review suggestions.	Tor Myklebust	2013-12-16	4	-9/+8
\| \| \| \| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \| \| \|	UI to display serialisation time of a stage.	Tor Myklebust	2013-12-16	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \| \| \|	Track task value serialisation time in TaskMetrics.	Tor Myklebust	2013-12-16	4	-15/+26
\| \| \|_\|_\|_\|/ / / / / / \| \|/\| \| \| \| \| \| \| \| \|
* \| \| \| \| \| \| \| \| \| \|	Merge pull request #276 from shivaram/collectPartition	Reynold Xin	2013-12-19	3	-4/+44
\|\ \ \ \ \ \ \ \ \ \ \ \| \|_\|_\|/ / / / / / / / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add collectPartition to JavaRDD interface. This interface is useful for implementing `take` from other language frontends where the data is serialized. Also remove `takePartition` from PythonRDD and use `collectPartition` in rdd.py. Thanks @concretevitamin for the original change and tests.
\| * \| \| \| \| \| \| \| \| \|	Add comment explaining collectPartitions's use	Shivaram Venkataraman	2013-12-19	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \| \|	Make collectPartitions take an array of partitions	Shivaram Venkataraman	2013-12-19	2	-11/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change the implementation to use runJob instead of PartitionPruningRDD. Also update the unit tests and the python take implementation to use the new interface.
\| * \| \| \| \| \| \| \| \| \|	Add collectPartition to JavaRDD interface.	Shivaram Venkataraman	2013-12-18	3	-5/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Also remove takePartition from PythonRDD and use collectPartition in rdd.py.