[SPARK-14628][CORE] Simplify task metrics by always tracking read/write metrics

## What changes were proposed in this pull request? Part of the reason why TaskMetrics and its callers are complicated are due to the optional metrics we collect, including input, output, shuffle read, and shuffle write. I think we can always track them and just assign 0 as the initial values. It is usually very obvious whether a task is supposed to read any data or not. By always tracking them, we can remove a lot of map, foreach, flatMap, getOrElse(0L) calls throughout Spark. This patch also changes a few behaviors. 1. Removed the distinction of data read/write methods (e.g. Hadoop, Memory, Network, etc). 2. Accumulate all data reads and writes, rather than only the first method. (Fixes SPARK-5225) ## How was this patch tested? existing tests. This is bases on https://github.com/apache/spark/pull/12388, with more test fixes. Author: Reynold Xin <rxin@databricks.com> Author: Wenchen Fan <wenchen@databricks.com> Closes #12417 from cloud-fan/metrics-refactor.
author: Reynold Xin <rxin@databricks.com> 2016-04-15 15:39:39 -0700
committer: Reynold Xin <rxin@databricks.com> 2016-04-15 15:39:39 -0700
commit: 8028a28885dbd90f20e38922240618fc310a0a65 (patch)
tree: 2a303488b198fdb417af37cfa6ad981b988f94ee /project
parent: 90b46e014a60069bd18754b02fce056d8f4d1b3e (diff)
download: spark-8028a28885dbd90f20e38922240618fc310a0a65.tar.gz
spark-8028a28885dbd90f20e38922240618fc310a0a65.tar.bz2
spark-8028a28885dbd90f20e38922240618fc310a0a65.zip
1 files changed, 4 insertions, 1 deletions
diff --git a/project/MimaExcludes.scala b/project/MimaExcludes.scala
index 71f337ce1f..7730823f94 100644
--- a/project/MimaExcludes.scala
+++ b/project/MimaExcludes.scala
@@ -630,7 +630,10 @@ object MimaExcludes {
         ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.TaskContext.getLocalProperty"),
         // [SPARK-14617] Remove deprecated APIs in TaskMetrics
         ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.executor.InputMetrics$"),
-        ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.executor.OutputMetrics$")
+        ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.executor.OutputMetrics$"),
+        // [SPARK-14628] Simplify task metrics by always tracking read/write metrics
+        ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.executor.InputMetrics.readMethod"),
+        ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.executor.OutputMetrics.writeMethod")
       )
     case v if v.startsWith("1.6") =>
       Seq(
author	Reynold Xin <rxin@databricks.com>	2016-04-15 15:39:39 -0700
committer	Reynold Xin <rxin@databricks.com>	2016-04-15 15:39:39 -0700
commit	8028a28885dbd90f20e38922240618fc310a0a65 (patch)
tree	2a303488b198fdb417af37cfa6ad981b988f94ee /project
parent	90b46e014a60069bd18754b02fce056d8f4d1b3e (diff)
download	spark-8028a28885dbd90f20e38922240618fc310a0a65.tar.gz spark-8028a28885dbd90f20e38922240618fc310a0a65.tar.bz2 spark-8028a28885dbd90f20e38922240618fc310a0a65.zip