[SPARK-13048][ML][MLLIB] keepLastCheckpoint option for LDA EM optimizer

## What changes were proposed in this pull request? The EMLDAOptimizer should generally not delete its last checkpoint since that can cause failures when DistributedLDAModel methods are called (if any partitions need to be recovered from the checkpoint). This PR adds a "deleteLastCheckpoint" option which defaults to false. This is a change in behavior from Spark 1.6, in that the last checkpoint will not be removed by default. This involves adding the deleteLastCheckpoint option to both spark.ml and spark.mllib, and modifying PeriodicCheckpointer to support the option. This also: * Makes MLlibTestSparkContext extend TempDirectory and set the checkpointDir to tempDir * Updates LibSVMRelationSuite because of a name conflict with "tempDir" (and fixes a bug where it failed to delete a temp directory) * Adds a MIMA exclude for DistributedLDAModel constructor, which is already ```private[clustering]``` ## How was this patch tested? Added 2 new unit tests to spark.ml LDASuite, which calls into spark.mllib. Author: Joseph K. Bradley <joseph@databricks.com> Closes #12166 from jkbradley/emlda-save-checkpoint.
author: Joseph K. Bradley <joseph@databricks.com> 2016-04-07 19:48:33 -0700
committer: Joseph K. Bradley <joseph@databricks.com> 2016-04-07 19:48:33 -0700
commit: 953ff897e422570a329d0aec98d573d3fb66ab9a (patch)
tree: c71211492dc024e469a834c07440713e78f7c981 /project
parent: 692c74840bc53debbb842db5372702f58207412c (diff)
download: spark-953ff897e422570a329d0aec98d573d3fb66ab9a.tar.gz
spark-953ff897e422570a329d0aec98d573d3fb66ab9a.tar.bz2
spark-953ff897e422570a329d0aec98d573d3fb66ab9a.zip
1 files changed, 3 insertions, 0 deletions
diff --git a/project/MimaExcludes.scala b/project/MimaExcludes.scala
index fbadc563b8..a53161dc9a 100644
--- a/project/MimaExcludes.scala
+++ b/project/MimaExcludes.scala
@@ -614,6 +614,9 @@ object MimaExcludes {
       ) ++ Seq(
         // [SPARK-13430][ML] moved featureCol from LinearRegressionModelSummary to LinearRegressionSummary
         ProblemFilters.exclude[MissingMethodProblem]("org.apache.spark.ml.regression.LinearRegressionSummary.this")
+      ) ++ Seq(
+        // [SPARK-13048][ML][MLLIB] keepLastCheckpoint option for LDA EM optimizer
+        ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.mllib.clustering.DistributedLDAModel.this")
       )
     case v if v.startsWith("1.6") =>
       Seq(
author	Joseph K. Bradley <joseph@databricks.com>	2016-04-07 19:48:33 -0700
committer	Joseph K. Bradley <joseph@databricks.com>	2016-04-07 19:48:33 -0700
commit	953ff897e422570a329d0aec98d573d3fb66ab9a (patch)
tree	c71211492dc024e469a834c07440713e78f7c981 /project
parent	692c74840bc53debbb842db5372702f58207412c (diff)
download	spark-953ff897e422570a329d0aec98d573d3fb66ab9a.tar.gz spark-953ff897e422570a329d0aec98d573d3fb66ab9a.tar.bz2 spark-953ff897e422570a329d0aec98d573d3fb66ab9a.zip