aboutsummaryrefslogtreecommitdiff
path: root/streaming/src
diff options
context:
space:
mode:
authorAndrew Or <andrewor14@gmail.com>2014-04-22 19:24:03 -0700
committerPatrick Wendell <pwendell@gmail.com>2014-04-22 19:24:03 -0700
commit2de573877fbed20092f1b3af20b603b30ba9a940 (patch)
tree66810110d77db0e4d5316cab69e98bbd9c6f89f2 /streaming/src
parent995fdc96bcd2c540804401eaab009a777d7d7aa9 (diff)
downloadspark-2de573877fbed20092f1b3af20b603b30ba9a940.tar.gz
spark-2de573877fbed20092f1b3af20b603b30ba9a940.tar.bz2
spark-2de573877fbed20092f1b3af20b603b30ba9a940.zip
[Spark-1538] Fix SparkUI incorrectly hiding persisted RDDs
**Bug**: After the following command `sc.parallelize(1 to 1000).persist.map(_ + 1).count()` is run, the the persisted RDD is missing from the storage tab of the SparkUI. **Cause**: The command creates two RDDs in one stage, a `ParallelCollectionRDD` and a `MappedRDD`. However, the existing StageInfo only keeps the RDDInfo of the last RDD associated with the stage (`MappedRDD`), and so all RDD information regarding the first RDD (`ParallelCollectionRDD`) is discarded. In this case, we persist the first RDD, but the StorageTab doesn't know about this RDD because it is not encoded in the StageInfo. **Fix**: Record information of all RDDs in StageInfo, instead of just the last RDD (i.e. `stage.rdd`). Since stage boundaries are marked by shuffle dependencies, the solution is to traverse the last RDD's dependency tree, visiting only ancestor RDDs related through a sequence of narrow dependencies. --- This PR also moves RDDInfo to its own file, includes a few style fixes, and adds a unit test for constructing StageInfos. Author: Andrew Or <andrewor14@gmail.com> Closes #469 from andrewor14/storage-ui-fix and squashes the following commits: 07fc7f0 [Andrew Or] Add back comment that was accidentally removed (minor) 5d799fe [Andrew Or] Add comment to justify testing of getNarrowAncestors with cycles 9d0e2b8 [Andrew Or] Hide details of getNarrowAncestors from outsiders d2bac8a [Andrew Or] Deal with cycles in RDD dependency graph + add extensive tests 2acb177 [Andrew Or] Move getNarrowAncestors to RDD.scala bfe83f0 [Andrew Or] Backtrace RDD dependency tree to find all RDDs that belong to a Stage
Diffstat (limited to 'streaming/src')
0 files changed, 0 insertions, 0 deletions