aboutsummaryrefslogtreecommitdiff
path: root/core
Commit message (Collapse)AuthorAgeFilesLines
* Merge pull request #190 from markhamstra/Stages4JobsMatei Zaharia2013-12-069-91/+280
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | stageId <--> jobId mapping in DAGScheduler Okay, I think this one is ready to go -- or at least it's ready for review and discussion. It's a carry-over of https://github.com/mesos/spark/pull/842 with updates for the newer job cancellation functionality. The prior discussion still applies. I've actually changed the job cancellation flow a bit: Instead of ``cancelTasks`` going to the TaskScheduler and then ``taskSetFailed`` coming back to the DAGScheduler (resulting in ``abortStage`` there), the DAGScheduler now takes care of figuring out which stages should be cancelled, tells the TaskScheduler to cancel tasks for those stages, then does the cleanup within the DAGScheduler directly without the need for any further prompting by the TaskScheduler. I know of three outstanding issues, each of which can and should, I believe, be handled in follow-up pull requests: 1) https://spark-project.atlassian.net/browse/SPARK-960 2) JobLogger should be re-factored to eliminate duplication 3) Related to 2), the WebUI should also become a consumer of the DAGScheduler's new understanding of the relationship between jobs and stages so that it can display progress indication and the like grouped by job. Right now, some of this information is just being sent out as part of ``SparkListenerJobStart`` messages, but more or different job <--> stage information may need to be exported from the DAGScheduler to meet listeners needs. Except for the eventQueue -> Actor commit, the rest can be cherry-picked almost cleanly into branch-0.8. A little merging is needed in MapOutputTracker and the DAGScheduler. Merged versions of those files are in https://github.com/markhamstra/incubator-spark/tree/aba2b40ce04ee9b7b9ea260abb6f09e050142d43 Note that between the recent Actor change in the DAGScheduler and the cleaning up of DAGScheduler data structures on job completion in this PR, some races have been introduced into the DAGSchedulerSuite. Those tests usually pass, and I don't think that better-behaved code that doesn't directly inspect DAGScheduler data structures should be seeing any problems, but I'll work on fixing DAGSchedulerSuite as either an addition to this PR or as a separate request. UPDATE: Fixed the race that I introduced. Created a JIRA issue (SPARK-965) for the one that was introduced with the switch to eventProcessorActor in the DAGScheduler.
| * SparkListenerJobStart posted from local jobsMark Hamstra2013-12-031-0/+1
| |
| * Synchronous, inline cleanup after runLocallyMark Hamstra2013-12-033-13/+6
| |
| * Local jobs post SparkListenerJobEnd, and DAGScheduler data structureMark Hamstra2013-12-032-8/+11
| | | | | | | | cleanup always occurs before any posting of SparkListenerJobEnd.
| * Tightly couple stageIdToJobIds and jobIdToStageIdsMark Hamstra2013-12-031-17/+12
| |
| * Cleaned up job cancellation handlingMark Hamstra2013-12-031-7/+5
| |
| * Refactoring to make job removal, stage removal, task cancellation clearerMark Hamstra2013-12-031-37/+39
| |
| * Improved commentMark Hamstra2013-12-031-4/+3
| |
| * Removed redundant residual re: reverted refactoring.Mark Hamstra2013-12-031-1/+1
| |
| * Fixed intended side-effectsMark Hamstra2013-12-031-2/+2
| |
| * Actor instead of eventQueue for LocalJobCompletedMark Hamstra2013-12-031-1/+1
| |
| * Added stageId <--> jobId mapping in DAGSchedulerMark Hamstra2013-12-039-88/+286
| | | | | | | | | | ...and make sure that DAGScheduler data structures are cleaned up on job completion. Initial effort and discussion at https://github.com/mesos/spark/pull/842
* | Merge pull request #233 from hsaputra/changecontexttobackendMatei Zaharia2013-12-061-2/+2
|\ \ | | | | | | | | | | | | | | | | | | Change the name of input argument in ClusterScheduler#initialize from context to backend. The SchedulerBackend used to be called ClusterSchedulerContext so just want to make small change of the input param in the ClusterScheduler#initialize to reflect this.
| * | Change the name of input ragument in ClusterScheduler#initialize from ↵Henry Saputra2013-12-051-2/+2
| | | | | | | | | | | | | | | | | | | | | context to backend. The SchedulerBackend used to be called ClusterSchedulerContext so just want to make small change of the input param in the ClusterScheduler#initialize to reflect this.
* | | Merge pull request #205 from kayousterhout/loggingMatei Zaharia2013-12-061-2/+34
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Added logging of scheduler delays to UI This commit adds two metrics to the UI: 1) The time to get task results, if they're fetched remotely 2) The scheduler delay. When the scheduler starts getting overwhelmed (because it can't keep up with the rate at which tasks are being submitted), the result is that tasks get delayed on the tail-end: the message from the worker saying that the task has completed ends up in a long queue and takes a while to be processed by the scheduler. This commit records that delay in the UI so that users can tell when the scheduler is becoming the bottleneck.
| * | | Fixed problem with scheduler delayKay Ousterhout2013-12-021-4/+7
| | | |
| * | | Added logging of scheduler delays to UIKay Ousterhout2013-11-211-2/+31
| | | |
* | | | Merge pull request #220 from rxin/zippartMatei Zaharia2013-12-061-16/+11
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Memoize preferred locations in ZippedPartitionsBaseRDD so preferred location computation doesn't lead to exponential explosion. This was a problem in GraphX where we have a whole chain of RDDs that are ZippedPartitionsRDD's, and the preferred locations were taking eternity to compute. (cherry picked from commit e36fe55a031d2c01c9d7c5d85965951c681a0c74) Signed-off-by: Reynold Xin <rxin@apache.org>
| * | | | Memoize preferred locations in ZippedPartitionsBaseRDD so preferred location ↵Reynold Xin2013-11-301-16/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | computation doesn't lead to exponential explosion. (cherry picked from commit e36fe55a031d2c01c9d7c5d85965951c681a0c74) Signed-off-by: Reynold Xin <rxin@apache.org>
* | | | | FutureAction result testsMark Hamstra2013-12-051-0/+26
| | | | |
* | | | | jobWaiter.synchronized before jobWaiter.waitMark Hamstra2013-12-052-1/+2
| |_|/ / |/| | |
* | | | Merge pull request #228 from pwendell/masterPatrick Wendell2013-12-052-3/+13
|\ \ \ \ | | | | | | | | | | | | | | | Document missing configs and set shuffle consolidation to false.
| * | | | Forcing shuffle consolidation in DiskBlockManagerSuitePatrick Wendell2013-12-051-2/+12
| | | | |
| * | | | Document missing configs and set shuffle consolidation to false.Patrick Wendell2013-12-041-1/+1
| | | | |
* | | | | Merge pull request #199 from harveyfeng/yarn-2.2Matei Zaharia2013-12-042-8/+4
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hadoop 2.2 migration Includes support for the YARN API stabilized in the Hadoop 2.2 release, and a few style patches. Short description for each set of commits: a98f5a0 - "Misc style changes in the 'yarn' package" a67ebf4 - "A few more style fixes in the 'yarn' package" Both of these are some minor style changes, such as fixing lines over 100 chars, to the existing YARN code. ab8652f - "Add a 'new-yarn' directory ... " Copies everything from `SPARK_HOME/yarn` to `SPARK_HOME/new-yarn`. No actual code changes here. 4f1c3fa - "Hadoop 2.2 YARN API migration ..." API patches to code in the `SPARK_HOME/new-yarn` directory. There are a few more small style changes mixed in, too. Based on @colorant's Hadoop 2.2 support for the scala-2.10 branch in #141. a1a1c62 - "Add optional Hadoop 2.2 settings in sbt build ... " If Spark should be built against Hadoop 2.2, then: a) the `org.apache.spark.deploy.yarn` package will be compiled from the `new-yarn` directory. b) Protobuf v2.5 will be used as a Spark dependency, since Hadoop 2.2 depends on it. Also, Spark will be built against a version of Akka v2.0.5 that's built against Protobuf 2.5, named `akka-2.0.5-protobuf-2.5`. The patched Akka is here: https://github.com/harveyfeng/akka/tree/2.0.5-protobuf-2.5, and was published to local Ivy during testing. There's also a new boolean environment variable, `SPARK_IS_NEW_HADOOP`, that users can manually set if their `SPARK_HADOOP_VERSION` specification does not start with `2.2`, which is how the build file tries to detect a 2.2 version. Not sure if this is necessary or done in the best way, though...
| * | | | | Fix pom.xml for maven buildRaymond Liu2013-12-031-7/+3
| | | | | |
| * | | | | Merge remote-tracking branch 'origin/master' into yarn-2.2Harvey Feng2013-11-2623-190/+984
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
| * | | | | | Hadoop 2.2 YARN API migration for `SPARK_HOME/new-yarn`Harvey Feng2013-11-231-1/+1
| | |_|_|/ / | |/| | | |
* | | | | | Merge pull request #227 from pwendell/masterPatrick Wendell2013-12-041-16/+13
|\ \ \ \ \ \ | | |_|/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix small bug in web UI and minor clean-up. There was a bug where sorting order didn't work correctly for write time metrics. I also cleaned up some earlier code that fixed the same issue for read and write bytes.
| * | | | | Fix small bug in web UI and minor clean-up.Patrick Wendell2013-12-041-16/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There was a bug where sorting order didn't work correctly for write time metrics. I also cleaned up some earlier code that fixed the same issue for read and write bytes.
* | | | | | Add missing space after "Serialized" in StorageLevelAndrew Ash2013-12-041-1/+1
|/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | Current code creates outputs like: scala> res0.getStorageLevel.description res2: String = Serialized1x Replicated
* | | | | Merge pull request #223 from rxin/transientMatei Zaharia2013-12-041-5/+5
|\ \ \ \ \ | |_|_|_|/ |/| | | | | | | | | | | | | | | | | | | Mark partitioner, name, and generator field in RDD as @transient. As part of the effort to reduce serialized task size.
| * | | | Marked doCheckpointCalled as transient.Reynold Xin2013-12-031-2/+2
| | | | |
| * | | | Mark partitioner, name, and generator field in RDD as @transient.Reynold Xin2013-12-021-3/+3
| | | | |
* | | | | Merge pull request #217 from aarondav/mesos-urlsReynold Xin2013-12-023-115/+260
|\ \ \ \ \ | |/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Re-enable zk:// urls for Mesos SparkContexts This was broken in PR #71 when we explicitly disallow anything that didn't fit a mesos:// url. Although it is not really clear that a zk:// url should match Mesos, it is what the docs say and it is necessary for backwards compatibility. Additionally added a unit test for the creation of all types of TaskSchedulers. Since YARN and Mesos are not necessarily available in the system, they are allowed to pass as long as the YARN/Mesos code paths are exercised.
| * | | | Add spaces between testsAaron Davidson2013-11-291-0/+5
| | | | |
| * | | | Add unit test for SparkContext scheduler creationAaron Davidson2013-11-283-116/+255
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since YARN and Mesos are not necessarily available in the system, they are allowed to pass as long as the YARN/Mesos code paths are exercised.
| * | | | Re-enable zk:// urls for Mesos SparkContextsAaron Davidson2013-11-281-5/+6
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | This was broken in PR #71 when we explicitly disallow anything that didn't fit a mesos:// url. Although it is not really clear that a zk:// url should match Mesos, it is what the docs say and it is necessary for backwards compatibility.
* | | | Merge pull request #219 from sundeepn/schedulerexceptionReynold Xin2013-12-011-1/+11
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | Scheduler quits when newStage fails The current scheduler thread does not handle exceptions from newStage stage while launching new jobs. The thread fails on any exception that gets triggered at that level, leaving the cluster hanging with no schduler.
| * | | | Log exception in scheduler in addition to passing it to the caller.Sundeep Narravula2013-12-011-2/+4
| | | | | | | | | | | | | | | | | | | | Code Styling changes.
| * | | | Scheduler quits when createStage fails.Sundeep Narravula2013-11-301-1/+9
| | |_|/ | |/| | | | | | | | | | The current scheduler thread does not handle exceptions from createStage stage while launching new jobs. The thread fails on any exception that gets triggered at that level, leaving the cluster hanging with no schduler.
* | | | More commentsLian, Cheng2013-11-291-0/+3
| | | |
* | | | Updated some inline comments in DAGSchedulerLian, Cheng2013-11-291-5/+26
| | | |
* | | | Bugfix: SPARK-965 & SPARK-966Lian, Cheng2013-11-283-25/+40
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SPARK-965: https://spark-project.atlassian.net/browse/SPARK-965 SPARK-966: https://spark-project.atlassian.net/browse/SPARK-966 * Add back DAGScheduler.start(), eventProcessActor is created and started here. Notice that function is only called by SparkContext. * Cancel the scheduled stage resubmission task when stopping eventProcessActor * Add a new DAGSchedulerEvent ResubmitFailedStages This event message is sent by the scheduled stage resubmission task to eventProcessActor. In this way, DAGScheduler.resubmitFailedStages is guaranteed to be executed from the same thread that runs DAGScheduler.processEvent. Please refer to discussion in SPARK-966 for details.
* | | Merge pull request #210 from haitaoyao/http-timeoutMatei Zaharia2013-11-271-2/+8
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | add http timeout for httpbroadcast While pulling task bytecode from HttpBroadcast server, there's no timeout value set. This may cause spark executor code hang and other task in the same executor process wait for the lock. I have encountered the issue in my cluster. Here's the stacktrace I captured : https://gist.github.com/haitaoyao/7655830 So add a time out value to ensure the task fail fast.
| * | | add http timeout for httpbroadcasthaitao.yao2013-11-261-2/+8
| | | |
* | | | Merge pull request #146 from JoshRosen/pyspark-custom-serializersMatei Zaharia2013-11-261-104/+45
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Custom Serializers for PySpark This pull request adds support for custom serializers to PySpark. For now, all Python-transformed (or parallelize()d RDDs) are serialized with the same serializer that's specified when creating SparkContext. For now, PySpark includes `PickleSerDe` and `MarshalSerDe` classes for using Python's `pickle` and `marshal` serializers. It's pretty easy to add support for other serializers, although I still need to add instructions on this. A few notable changes: - The Scala `PythonRDD` class no longer manipulates Pickled objects; data from `textFile` is written to Python as MUTF-8 strings. The Python code performs the appropriate bookkeeping to track which deserializer should be used when reading an underlying JavaRDD. This mechanism could also be used to support other data exchange formats, such as MsgPack. - Several magic numbers were refactored into constants. - Batching is implemented by wrapping / decorating an unbatched SerDe.
| * | | | Send PySpark commands as bytes insetad of strings.Josh Rosen2013-11-101-20/+4
| | | | |
| * | | | Add custom serializer support to PySpark.Josh Rosen2013-11-101-22/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For now, this only adds MarshalSerializer, but it lays the groundwork for other supporting custom serializers. Many of these mechanisms can also be used to support deserialization of different data formats sent by Java, such as data encoded by MsgPack. This also fixes a bug in SparkContext.union().
| * | | | Remove Pickle-wrapping of Java objects in PySpark.Josh Rosen2013-11-031-67/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we support custom serializers, the Python worker will know what type of input to expect, so we won't need to wrap Tuple2 and Strings into pickled tuples and strings.