aboutsummaryrefslogtreecommitdiff
path: root/core
Commit message (Collapse)AuthorAgeFilesLines
* Merge pull request #127 from kayousterhout/consolidate_schedulersPatrick Wendell2013-12-2425-1515/+940
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Deduplicate Local and Cluster schedulers. The code in LocalScheduler/LocalTaskSetManager was nearly identical to the code in ClusterScheduler/ClusterTaskSetManager. The redundancy made making updating the schedulers unnecessarily painful and error- prone. This commit combines the two into a single TaskScheduler/ TaskSetManager. Unfortunately the diff makes this change look much more invasive than it is -- TaskScheduler.scala is only superficially changed (names updated, overrides removed) from the old ClusterScheduler.scala, and the same with TaskSetManager.scala. Thanks @rxin for suggesting this change!
| * Responded to Reynold's style commentsKay Ousterhout2013-12-243-6/+7
| |
| * Correctly merged in maxTaskFailures fixKay Ousterhout2013-12-224-5/+5
| |
| * Fix build error in testKay Ousterhout2013-12-211-1/+1
| |
| * Renamed ClusterScheduler to TaskSchedulerImplKay Ousterhout2013-12-2014-39/+39
| |
| * Merge remote branch 'upstream/master' into consolidate_schedulersKay Ousterhout2013-12-20121-707/+951
| |\ | | | | | | | | | | | | | | | | | | Conflicts: core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala
| * \ Merge master into 127Aaron Davidson2013-12-0861-664/+2031
| |\ \
| * | | Fixed error message in ClusterScheduler to be consistent with the old ↵Kay Ousterhout2013-11-151-2/+6
| | | | | | | | | | | | | | | | LocalScheduler
| * | | Merge remote-tracking branch 'upstream/master' into consolidate_schedulersKay Ousterhout2013-11-157-80/+46
| |\ \ \ | | | | | | | | | | | | | | | | | | | | Conflicts: core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala
| * | | | Don't retry tasks if result wasn't serializableKay Ousterhout2013-11-141-1/+11
| | | | |
| * | | | Fix bug where scheduler could hang after task failure.Kay Ousterhout2013-11-141-10/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a task fails, we need to call reviveOffers() so that the task can be rescheduled on a different machine. In the current code, the state in ClusterTaskSetManager indicating which tasks are pending may be updated after revive offers is called (there's a race condition here), so when revive offers is called, the task set manager does not yet realize that there are failed tasks that need to be relaunched.
| * | | | Changed local backend to use Akka actorKay Ousterhout2013-11-141-23/+57
| | | | |
| * | | | Fixed naming issues and added back ability to specify max task failures.Kay Ousterhout2013-11-1312-124/+174
| | | | |
| * | | | Merge remote-tracking branch 'upstream/master' into consolidate_schedulersKay Ousterhout2013-11-1337-507/+2019
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: core/src/main/scala/org/apache/spark/scheduler/ClusterScheduler.scala
| * | | | | Extracted TaskScheduler interface.Kay Ousterhout2013-11-1314-73/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Also changed the default maximum number of task failures to be 0 when running in local mode.
| * | | | | Cleaned up imports and fixed test bugKay Ousterhout2013-10-313-7/+6
| | | | | |
| * | | | | Fixed most issues with unit testsKay Ousterhout2013-10-305-106/+103
| | | | | |
| * | | | | Deduplicate Local and Cluster schedulers.Kay Ousterhout2013-10-3021-1924/+1268
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code in LocalScheduler/LocalTaskSetManager was nearly identical to the code in ClusterScheduler/ClusterTaskSetManager. The redundancy made making updating the schedulers unnecessarily painful and error- prone. This commit combines the two into a single TaskScheduler/ TaskSetManager.
* | | | | | Merge pull request #279 from aarondav/shuffle-cleanup0Patrick Wendell2013-12-243-7/+35
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clean up shuffle files once their metadata is gone Previously, we would only clean the in-memory metadata for consolidated shuffle files. Additionally, fixes a bug where the Metadata Cleaner was ignoring type-specific TTLs.
| * | | | | | Clean up shuffle files once their metadata is goneAaron Davidson2013-12-193-7/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, we would only clean the in-memory metadata for consolidated shuffle files. Additionally, fixes a bug where the Metadata Cleaner was ignoring type- specific TTLs.
* | | | | | | Merge pull request #277 from tdas/scheduler-updateMatei Zaharia2013-12-242-6/+2
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Refactored the streaming scheduler and added StreamingListener interface - Refactored the streaming scheduler for cleaner code. Specifically, the JobManager was renamed to JobScheduler, as it does the actual scheduling of Spark jobs to the SparkContext. The earlier Scheduler was renamed to JobGenerator, as it actually generates the jobs from the DStreams. The JobScheduler starts the JobGenerator. Also, moved all the scheduler related code from spark.streaming to spark.streaming.scheduler package. - Implemented the StreamingListener interface, similar to SparkListener. The streaming version of StatusReportListener prints the batch processing time statistics (for now). Added StreamingListernerSuite to test it. - Refactored streaming TestSuiteBase for deduping code in the other streaming testsuites.
| * | | | | | | Minor changes.Tathagata Das2013-12-181-1/+0
| | | | | | | |
| * | | | | | | Merge branch 'apache-master' into scheduler-updateTathagata Das2013-12-18109-627/+738
| |\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/dstream/ForEachDStream.scala
| * | | | | | | | Added StatsReportListener to generate processing time statistics across ↵Tathagata Das2013-12-181-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | multiple batches.
| * | | | | | | | Refactored streaming scheduler and added listener interface.Tathagata Das2013-12-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Refactored Scheduler + JobManager to JobGenerator + JobScheduler and added JobSet for cleaner code. Moved scheduler related code to streaming.scheduler package. - Added StreamingListener trait (similar to SparkListener) to enable gathering to streaming stats like processing times and delays. StreamingContext.addListener() to added listeners. - Deduped some code in streaming tests by modifying TestSuiteBase, and added StreamingListenerSuite.
* | | | | | | | | Merge pull request #244 from leftnoteasy/masterReynold Xin2013-12-237-5/+257
|\ \ \ \ \ \ \ \ \ | |_|_|_|_|_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Added SPARK-968 implementation for review Added SPARK-968 implementation for review
| * | | | | | | | SPARK-968, added executor address showing in aggregated metrics by executors ↵wangda.tan2013-12-231-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | table
| * | | | | | | | added changes according to comments from rxinwangda.tan2013-12-227-44/+24
| | | | | | | | |
| * | | | | | | | spark-968, changes for avoid a NPEwangda.tan2013-12-172-25/+29
| | | | | | | | |
| * | | | | | | | spark-898, changes according to review commentswangda.tan2013-12-177-76/+90
| | | | | | | | |
| * | | | | | | | Merge branch 'master' of git://github.com/apache/incubator-sparkwangda.tan2013-12-16112-615/+834
| |\ \ \ \ \ \ \ \
| * | | | | | | | | SPARK-968, added sc finalize code to avoid akka rebinding to the same portwangda.tan2013-12-091-0/+7
| | | | | | | | | |
| * | | | | | | | | Merge branch 'master' of https://github.com/leftnoteasy/incubator-spark-1wangda.tan2013-12-0921-260/+651
| |\ \ \ \ \ \ \ \ \ | | | |_|_|_|_|_|_|/ | | |/| | | | | | |
| * | | | | | | | | SPARK-968, In stage UI, add an overview section that shows task stats ↵wangda.tan2013-12-095-0/+234
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | grouped by executor id
* | | | | | | | | | Merge pull request #280 from aarondav/minorPatrick Wendell2013-12-203-17/+8
|\ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Minor cleanup for standalone scheduler See commit messages
| * | | | | | | | | | Fix compiler warning in SparkZooKeeperSessionAaron Davidson2013-12-191-0/+1
| | | | | | | | | | |
| * | | | | | | | | | Remove firstApp from the standalone scheduler MasterAaron Davidson2013-12-191-10/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As a lonely child with no one to care for it... we had to put it down.
| * | | | | | | | | | Extraordinarily minor code/comment cleanupAaron Davidson2013-12-192-7/+7
| | |_|_|_|_|/ / / / | |/| | | | | | | |
* | | | | | | | | | Merge pull request #272 from tmyklebu/masterPatrick Wendell2013-12-195-16/+36
|\ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Track and report task result serialisation time. - DirectTaskResult now has a ByteBuffer valueBytes instead of a T value. - DirectTaskResult now has a member function T value() that deserialises valueBytes. - Executor serialises value into a ByteBuffer and passes it to DTR's ctor. - Executor tracks the time taken to do so and puts it in a new field in TaskMetrics. - StagePage now reports serialisation time from TaskMetrics along with the other things it reported.
| * | | | | | | | | | Add a serialisation time column to the StagePage.Tor Myklebust2013-12-181-1/+5
| | | | | | | | | | |
| * | | | | | | | | | objectSer -> valueSer in a test.Tor Myklebust2013-12-171-2/+2
| | | | | | | | | | |
| * | | | | | | | | | Missed a spot; had an objectSer here too.Tor Myklebust2013-12-171-2/+2
| | | | | | | | | | |
| * | | | | | | | | | Merge branch 'master' of git://github.com/apache/incubator-sparkTor Myklebust2013-12-163-5/+5
| |\ \ \ \ \ \ \ \ \ \
| * | | | | | | | | | | Incorporate pwendell's code review suggestions.Tor Myklebust2013-12-164-9/+8
| | | | | | | | | | | |
| * | | | | | | | | | | UI to display serialisation time of a stage.Tor Myklebust2013-12-161-0/+6
| | | | | | | | | | | |
| * | | | | | | | | | | Track task value serialisation time in TaskMetrics.Tor Myklebust2013-12-164-15/+26
| | |_|_|_|/ / / / / / | |/| | | | | | | | |
* | | | | | | | | | | Merge pull request #276 from shivaram/collectPartitionReynold Xin2013-12-193-4/+44
|\ \ \ \ \ \ \ \ \ \ \ | |_|_|/ / / / / / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add collectPartition to JavaRDD interface. This interface is useful for implementing `take` from other language frontends where the data is serialized. Also remove `takePartition` from PythonRDD and use `collectPartition` in rdd.py. Thanks @concretevitamin for the original change and tests.
| * | | | | | | | | | Add comment explaining collectPartitions's useShivaram Venkataraman2013-12-191-0/+2
| | | | | | | | | | |
| * | | | | | | | | | Make collectPartitions take an array of partitionsShivaram Venkataraman2013-12-192-11/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change the implementation to use runJob instead of PartitionPruningRDD. Also update the unit tests and the python take implementation to use the new interface.
| * | | | | | | | | | Add collectPartition to JavaRDD interface.Shivaram Venkataraman2013-12-183-5/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Also remove takePartition from PythonRDD and use collectPartition in rdd.py.