diff options
author | Andrew Or <andrewor14@gmail.com> | 2014-04-22 19:24:03 -0700 |
---|---|---|
committer | Patrick Wendell <pwendell@gmail.com> | 2014-04-22 19:24:03 -0700 |
commit | 2de573877fbed20092f1b3af20b603b30ba9a940 (patch) | |
tree | 66810110d77db0e4d5316cab69e98bbd9c6f89f2 /graphx/data | |
parent | 995fdc96bcd2c540804401eaab009a777d7d7aa9 (diff) | |
download | spark-2de573877fbed20092f1b3af20b603b30ba9a940.tar.gz spark-2de573877fbed20092f1b3af20b603b30ba9a940.tar.bz2 spark-2de573877fbed20092f1b3af20b603b30ba9a940.zip |
[Spark-1538] Fix SparkUI incorrectly hiding persisted RDDs
**Bug**: After the following command `sc.parallelize(1 to 1000).persist.map(_ + 1).count()` is run, the the persisted RDD is missing from the storage tab of the SparkUI.
**Cause**: The command creates two RDDs in one stage, a `ParallelCollectionRDD` and a `MappedRDD`. However, the existing StageInfo only keeps the RDDInfo of the last RDD associated with the stage (`MappedRDD`), and so all RDD information regarding the first RDD (`ParallelCollectionRDD`) is discarded. In this case, we persist the first RDD, but the StorageTab doesn't know about this RDD because it is not encoded in the StageInfo.
**Fix**: Record information of all RDDs in StageInfo, instead of just the last RDD (i.e. `stage.rdd`). Since stage boundaries are marked by shuffle dependencies, the solution is to traverse the last RDD's dependency tree, visiting only ancestor RDDs related through a sequence of narrow dependencies.
---
This PR also moves RDDInfo to its own file, includes a few style fixes, and adds a unit test for constructing StageInfos.
Author: Andrew Or <andrewor14@gmail.com>
Closes #469 from andrewor14/storage-ui-fix and squashes the following commits:
07fc7f0 [Andrew Or] Add back comment that was accidentally removed (minor)
5d799fe [Andrew Or] Add comment to justify testing of getNarrowAncestors with cycles
9d0e2b8 [Andrew Or] Hide details of getNarrowAncestors from outsiders
d2bac8a [Andrew Or] Deal with cycles in RDD dependency graph + add extensive tests
2acb177 [Andrew Or] Move getNarrowAncestors to RDD.scala
bfe83f0 [Andrew Or] Backtrace RDD dependency tree to find all RDDs that belong to a Stage
Diffstat (limited to 'graphx/data')
0 files changed, 0 insertions, 0 deletions