aboutsummaryrefslogtreecommitdiff
path: root/python
diff options
context:
space:
mode:
authoruncleGen <hustyugm@gmail.com>2014-08-27 10:32:13 -0700
committerAndrew Or <andrewor14@gmail.com>2014-08-27 10:33:13 -0700
commit8f8e2a4ee7419a96196727704695f5114da5b84e (patch)
treee31f524923e2bd88cad756a6710a22cb8fb1a2ca /python
parent1d468df33c7b8680af12fcdb66ed91f48c80cae3 (diff)
downloadspark-8f8e2a4ee7419a96196727704695f5114da5b84e.tar.gz
spark-8f8e2a4ee7419a96196727704695f5114da5b84e.tar.bz2
spark-8f8e2a4ee7419a96196727704695f5114da5b84e.zip
[SPARK-3170][CORE][BUG]:RDD info loss in "StorageTab" and "ExecutorTab"
compeleted stage only need to remove its own partitions that are no longer cached. However, "StorageTab" may lost some rdds which are cached actually. Not only in "StorageTab", "ExectutorTab" may also lose some rdd info which have been overwritten by last rdd in a same task. 1. "StorageTab": when multiple stages run simultaneously, completed stage will remove rdd info which belong to other stages that are still running. 2. "ExectutorTab": taskcontext may lose some "updatedBlocks" info of rdds in a dependency chain. Like the following example: val r1 = sc.paralize(..).cache() val r2 = r1.map(...).cache() val n = r2.count() When count the r2, r1 and r2 will be cached finally. So in CacheManager.getOrCompute, the taskcontext should contain "updatedBlocks" of r1 and r2. Currently, the "updatedBlocks" only contain the info of r2. Author: uncleGen <hustyugm@gmail.com> Closes #2131 from uncleGen/master_ui_fix and squashes the following commits: a6a8a0b [uncleGen] fix some coding style 3a1bc15 [uncleGen] fix some error in unit test 56ea488 [uncleGen] there's some line too long c82ba82 [uncleGen] Bug Fix: RDD info loss in "StorageTab" and "ExecutorTab" (cherry picked from commit d8298c46b7bf566d1cd2f7ea9b1b2b2722dcfb17) Signed-off-by: Andrew Or <andrewor14@gmail.com>
Diffstat (limited to 'python')
0 files changed, 0 insertions, 0 deletions