aboutsummaryrefslogtreecommitdiff
path: root/core/src/main/scala
diff options
context:
space:
mode:
authorReynold Xin <rxin@apache.org>2014-09-06 19:06:30 -0700
committerMatei Zaharia <matei@databricks.com>2014-09-06 19:06:30 -0700
commit3fb57a0ab3d76fda2301dbe9f2f3fa6743b4ed78 (patch)
tree12db31afe1ca762b95c5b4862a97f55abf0592e2 /core/src/main/scala
parent110fb8b24d2454ad7c979c3934dbed87650f17b8 (diff)
downloadspark-3fb57a0ab3d76fda2301dbe9f2f3fa6743b4ed78.tar.gz
spark-3fb57a0ab3d76fda2301dbe9f2f3fa6743b4ed78.tar.bz2
spark-3fb57a0ab3d76fda2301dbe9f2f3fa6743b4ed78.zip
[SPARK-3353] parent stage should have lower stage id.
Previously parent stages had higher stage id, but parent stages are executed first. This pull request changes the behavior so parent stages would have lower stage id. For example, command: ```scala sc.parallelize(1 to 10).map(x=>(x,x)).reduceByKey(_+_).count ``` breaks down into 2 stages. The old web UI: ![screen shot 2014-09-04 at 12 42 44 am](https://cloud.githubusercontent.com/assets/323388/4146177/60fb4f42-3407-11e4-819f-853eb0e22b25.png) Web UI with this patch: ![screen shot 2014-09-04 at 12 44 55 am](https://cloud.githubusercontent.com/assets/323388/4146178/62e08e62-3407-11e4-867b-a36b10534464.png) Author: Reynold Xin <rxin@apache.org> Closes #2273 from rxin/lower-stage-id and squashes the following commits: abbb4c6 [Reynold Xin] Fixed SparkListenerSuite. 0e02379 [Reynold Xin] Updated DAGSchedulerSuite. 54ccea3 [Reynold Xin] [SPARK-3353] parent stage should have lower stage id.
Diffstat (limited to 'core/src/main/scala')
-rw-r--r--core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala4
1 files changed, 2 insertions, 2 deletions
diff --git a/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala b/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
index 2ccc27324a..6fcf9e3154 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
@@ -241,9 +241,9 @@ class DAGScheduler(
callSite: CallSite)
: Stage =
{
+ val parentStages = getParentStages(rdd, jobId)
val id = nextStageId.getAndIncrement()
- val stage =
- new Stage(id, rdd, numTasks, shuffleDep, getParentStages(rdd, jobId), jobId, callSite)
+ val stage = new Stage(id, rdd, numTasks, shuffleDep, parentStages, jobId, callSite)
stageIdToStage(id) = stage
updateJobIdStageIdMaps(jobId, stage)
stage