aboutsummaryrefslogtreecommitdiff
path: root/project
diff options
context:
space:
mode:
authorAndrew Or <andrew@databricks.com>2015-08-03 10:58:37 -0700
committerTathagata Das <tathagata.das1565@gmail.com>2015-08-03 10:58:37 -0700
commitb41a32718d615b304efba146bf97be0229779b01 (patch)
tree657d1474da2a14485b6106cef8089af775f86dbb /project
parent69f5a7c934ac553ed52c00679b800bcffe83c1d6 (diff)
downloadspark-b41a32718d615b304efba146bf97be0229779b01.tar.gz
spark-b41a32718d615b304efba146bf97be0229779b01.tar.bz2
spark-b41a32718d615b304efba146bf97be0229779b01.zip
[SPARK-1855] Local checkpointing
Certain use cases of Spark involve RDDs with long lineages that must be truncated periodically (e.g. GraphX). The existing way of doing it is through `rdd.checkpoint()`, which is expensive because it writes to HDFS. This patch provides an alternative to truncate lineages cheaply *without providing the same level of fault tolerance*. **Local checkpointing** writes checkpointed data to the local file system through the block manager. It is much faster than replicating to a reliable storage and provides the same semantics as long as executors do not fail. It is accessible through a new operator `rdd.localCheckpoint()` and leaves the old one unchanged. Users may even decide to combine the two and call the reliable one less frequently. The bulk of this patch involves refactoring the checkpointing interface to accept custom implementations of checkpointing. [Design doc](https://issues.apache.org/jira/secure/attachment/12741708/SPARK-7292-design.pdf). Author: Andrew Or <andrew@databricks.com> Closes #7279 from andrewor14/local-checkpoint and squashes the following commits: 729600f [Andrew Or] Oops, fix tests 34bc059 [Andrew Or] Avoid computing all partitions in local checkpoint e43bbb6 [Andrew Or] Merge branch 'master' of github.com:apache/spark into local-checkpoint 3be5aea [Andrew Or] Address comments bf846a6 [Andrew Or] Merge branch 'master' of github.com:apache/spark into local-checkpoint ab003a3 [Andrew Or] Fix compile c2e111b [Andrew Or] Address comments 33f167a [Andrew Or] Merge branch 'master' of github.com:apache/spark into local-checkpoint e908a42 [Andrew Or] Fix tests f5be0f3 [Andrew Or] Use MEMORY_AND_DISK as the default local checkpoint level a92657d [Andrew Or] Update a few comments e58e3e3 [Andrew Or] Merge branch 'master' of github.com:apache/spark into local-checkpoint 4eb6eb1 [Andrew Or] Merge branch 'master' of github.com:apache/spark into local-checkpoint 1bbe154 [Andrew Or] Simplify LocalCheckpointRDD 48a9996 [Andrew Or] Avoid traversing dependency tree + rewrite tests 62aba3f [Andrew Or] Merge branch 'master' of github.com:apache/spark into local-checkpoint db70dc2 [Andrew Or] Express local checkpointing through caching the original RDD 87d43c6 [Andrew Or] Merge branch 'master' of github.com:apache/spark into local-checkpoint c449b38 [Andrew Or] Fix style 4a182f3 [Andrew Or] Add fine-grained tests for local checkpointing 53b363b [Andrew Or] Rename a few more awkwardly named methods (minor) e4cf071 [Andrew Or] Simplify LocalCheckpointRDD + docs + clean ups 4880deb [Andrew Or] Fix style d096c67 [Andrew Or] Fix mima 172cb66 [Andrew Or] Fix mima? e53d964 [Andrew Or] Fix style 56831c5 [Andrew Or] Add a few warnings and clear exception messages 2e59646 [Andrew Or] Add local checkpoint clean up tests 4dbbab1 [Andrew Or] Refactor CheckpointSuite to test local checkpointing 4514dc9 [Andrew Or] Clean local checkpoint files through RDD cleanups 0477eec [Andrew Or] Rename a few methods with awkward names (minor) 2e902e5 [Andrew Or] First implementation of local checkpointing 8447454 [Andrew Or] Fix tests 4ac1896 [Andrew Or] Refactor checkpoint interface for modularity
Diffstat (limited to 'project')
-rw-r--r--project/MimaExcludes.scala9
1 files changed, 7 insertions, 2 deletions
diff --git a/project/MimaExcludes.scala b/project/MimaExcludes.scala
index f9384c4c3c..280aac9319 100644
--- a/project/MimaExcludes.scala
+++ b/project/MimaExcludes.scala
@@ -80,8 +80,13 @@ object MimaExcludes {
"org.apache.spark.mllib.linalg.Matrix.numActives")
) ++ Seq(
// SPARK-8914 Remove RDDApi
- ProblemFilters.exclude[MissingClassProblem](
- "org.apache.spark.sql.RDDApi")
+ ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.sql.RDDApi")
+ ) ++ Seq(
+ // SPARK-7292 Provide operator to truncate lineage cheaply
+ ProblemFilters.exclude[AbstractClassProblem](
+ "org.apache.spark.rdd.RDDCheckpointData"),
+ ProblemFilters.exclude[AbstractClassProblem](
+ "org.apache.spark.rdd.CheckpointRDD")
) ++ Seq(
// SPARK-8701 Add input metadata in the batch page.
ProblemFilters.exclude[MissingClassProblem](