[SPARK-15317][CORE] Don't store accumulators for every task in listeners - spark

diff options

author	Shixiong Zhu <shixiong@databricks.com>	2016-05-19 12:05:17 -0700
committer	Andrew Or <andrew@databricks.com>	2016-05-19 12:05:17 -0700
commit	4e3cb7a5d965fd490390398ecfe35f1fc05e8511 (patch)
tree	aa7b9e92c8196e6d02fbed6e8a5dd1ec3754de0c /python
parent	6ac1c3a040f88fae15c46acd73e7e3691f7d3619 (diff)
download	spark-4e3cb7a5d965fd490390398ecfe35f1fc05e8511.tar.gz spark-4e3cb7a5d965fd490390398ecfe35f1fc05e8511.tar.bz2 spark-4e3cb7a5d965fd490390398ecfe35f1fc05e8511.zip

[SPARK-15317][CORE] Don't store accumulators for every task in listeners

## What changes were proposed in this pull request? In general, the Web UI doesn't need to store the Accumulator/AccumulableInfo for every task. It only needs the Accumulator values. In this PR, it creates new UIData classes to store the necessary fields and make `JobProgressListener` store only these new classes, so that `JobProgressListener` won't store Accumulator/AccumulableInfo and the size of `JobProgressListener` becomes pretty small. I also eliminates `AccumulableInfo` from `SQLListener` so that we don't keep any references for those unused `AccumulableInfo`s. ## How was this patch tested? I ran two tests reported in JIRA locally: The first one is: ``` val data = spark.range(0, 10000, 1, 10000) data.cache().count() ``` The retained size of JobProgressListener decreases from 60.7M to 6.9M. The second one is: ``` import org.apache.spark.ml.CC import org.apache.spark.sql.SQLContext val sqlContext = SQLContext.getOrCreate(sc) CC.runTest(sqlContext) ``` This test won't cause OOM after applying this patch. Author: Shixiong Zhu <shixiong@databricks.com> Closes #13153 from zsxwing/memory.

Diffstat (limited to 'python')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: