aboutsummaryrefslogtreecommitdiff
path: root/python
diff options
context:
space:
mode:
authorShixiong Zhu <shixiong@databricks.com>2016-05-19 12:05:17 -0700
committerAndrew Or <andrew@databricks.com>2016-05-19 12:05:17 -0700
commit4e3cb7a5d965fd490390398ecfe35f1fc05e8511 (patch)
treeaa7b9e92c8196e6d02fbed6e8a5dd1ec3754de0c /python
parent6ac1c3a040f88fae15c46acd73e7e3691f7d3619 (diff)
downloadspark-4e3cb7a5d965fd490390398ecfe35f1fc05e8511.tar.gz
spark-4e3cb7a5d965fd490390398ecfe35f1fc05e8511.tar.bz2
spark-4e3cb7a5d965fd490390398ecfe35f1fc05e8511.zip
[SPARK-15317][CORE] Don't store accumulators for every task in listeners
## What changes were proposed in this pull request? In general, the Web UI doesn't need to store the Accumulator/AccumulableInfo for every task. It only needs the Accumulator values. In this PR, it creates new UIData classes to store the necessary fields and make `JobProgressListener` store only these new classes, so that `JobProgressListener` won't store Accumulator/AccumulableInfo and the size of `JobProgressListener` becomes pretty small. I also eliminates `AccumulableInfo` from `SQLListener` so that we don't keep any references for those unused `AccumulableInfo`s. ## How was this patch tested? I ran two tests reported in JIRA locally: The first one is: ``` val data = spark.range(0, 10000, 1, 10000) data.cache().count() ``` The retained size of JobProgressListener decreases from 60.7M to 6.9M. The second one is: ``` import org.apache.spark.ml.CC import org.apache.spark.sql.SQLContext val sqlContext = SQLContext.getOrCreate(sc) CC.runTest(sqlContext) ``` This test won't cause OOM after applying this patch. Author: Shixiong Zhu <shixiong@databricks.com> Closes #13153 from zsxwing/memory.
Diffstat (limited to 'python')
0 files changed, 0 insertions, 0 deletions