[SPARK-14654][CORE] New accumulator API

## What changes were proposed in this pull request? This PR introduces a new accumulator API which is much simpler than before: 1. the type hierarchy is simplified, now we only have an `Accumulator` class 2. Combine `initialValue` and `zeroValue` concepts into just one concept: `zeroValue` 3. there in only one `register` method, the accumulator registration and cleanup registration are combined. 4. the `id`,`name` and `countFailedValues` are combined into an `AccumulatorMetadata`, and is provided during registration. `SQLMetric` is a good example to show the simplicity of this new API. What we break: 1. no `setValue` anymore. In the new API, the intermedia type can be different from the result type, it's very hard to implement a general `setValue` 2. accumulator can't be serialized before registered. Problems need to be addressed in follow-ups: 1. with this new API, `AccumulatorInfo` doesn't make a lot of sense, the partial output is not partial updates, we need to expose the intermediate value. 2. `ExceptionFailure` should not carry the accumulator updates. Why do users care about accumulator updates for failed cases? It looks like we only use this feature to update the internal metrics, how about we sending a heartbeat to update internal metrics after the failure event? 3. the public event `SparkListenerTaskEnd` carries a `TaskMetrics`. Ideally this `TaskMetrics` don't need to carry external accumulators, as the only method of `TaskMetrics` that can access external accumulators is `private[spark]`. However, `SQLListener` use it to retrieve sql metrics. ## How was this patch tested? existing tests Author: Wenchen Fan <wenchen@databricks.com> Closes #12612 from cloud-fan/acc.
author: Wenchen Fan <wenchen@databricks.com> 2016-04-28 00:26:39 -0700
committer: Reynold Xin <rxin@databricks.com> 2016-04-28 00:26:39 -0700
commit: bf5496dbdac75ea69081c95a92a29771e635ea98 (patch)
tree: b6151d946b25171b5bbcd3252aa77f7f7a69f60c /project
parent: be317d4a90b3ca906fefeb438f89a09b1c7da5a8 (diff)
download: spark-bf5496dbdac75ea69081c95a92a29771e635ea98.tar.gz
spark-bf5496dbdac75ea69081c95a92a29771e635ea98.tar.bz2
spark-bf5496dbdac75ea69081c95a92a29771e635ea98.zip
1 files changed, 12 insertions, 0 deletions
diff --git a/project/MimaExcludes.scala b/project/MimaExcludes.scala
index 0f8648f890..6fc49a08fe 100644
--- a/project/MimaExcludes.scala
+++ b/project/MimaExcludes.scala
@@ -688,6 +688,18 @@ object MimaExcludes {
       ) ++ Seq(
         // [SPARK-4452][Core]Shuffle data structures can starve others on the same thread for memory
         ProblemFilters.exclude[IncompatibleTemplateDefProblem]("org.apache.spark.util.collection.Spillable")
+      ) ++ Seq(
+        // SPARK-14654: New accumulator API
+        ProblemFilters.exclude[MissingTypesProblem]("org.apache.spark.ExceptionFailure$"),
+        ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.ExceptionFailure.apply"),
+        ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.ExceptionFailure.metrics"),
+        ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.ExceptionFailure.copy"),
+        ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.ExceptionFailure.this"),
+        ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.executor.ShuffleReadMetrics.remoteBlocksFetched"),
+        ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.executor.ShuffleReadMetrics.totalBlocksFetched"),
+        ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.executor.ShuffleReadMetrics.localBlocksFetched"),
+        ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.status.api.v1.ShuffleReadMetrics.remoteBlocksFetched"),
+        ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.status.api.v1.ShuffleReadMetrics.localBlocksFetched")
       )
     case v if v.startsWith("1.6") =>
       Seq(
author	Wenchen Fan <wenchen@databricks.com>	2016-04-28 00:26:39 -0700
committer	Reynold Xin <rxin@databricks.com>	2016-04-28 00:26:39 -0700
commit	bf5496dbdac75ea69081c95a92a29771e635ea98 (patch)
tree	b6151d946b25171b5bbcd3252aa77f7f7a69f60c /project
parent	be317d4a90b3ca906fefeb438f89a09b1c7da5a8 (diff)
download	spark-bf5496dbdac75ea69081c95a92a29771e635ea98.tar.gz spark-bf5496dbdac75ea69081c95a92a29771e635ea98.tar.bz2 spark-bf5496dbdac75ea69081c95a92a29771e635ea98.zip