Bug Fix: without unpersist method in RandomForest.scala

During trainning Gradient Boosting Decision Tree on large-scale sparse data, spark spill hundreds of data onto disk. And find the bug below: In version 1.1.0 DecisionTree.scala, train Method, treeInput has been persisted in Memory, but without unpersist. It caused heavy DISK usage. In github version(1.2.0 maybe), RandomForest.scala, train Method, baggedInput has been persisted but without unpersisted too. After added unpersist, it works right. https://issues.apache.org/jira/browse/SPARK-3918 Author: omgteam <Kimlong.Liu@gmail.com> Closes #2775 from omgteam/master and squashes the following commits: 815d543 [omgteam] adjust tab to spaces 1a36f83 [omgteam] Bug: fix without unpersist baggedInput in RandomForest.scala
author: omgteam <Kimlong.Liu@gmail.com> 2014-10-13 09:59:41 -0700
committer: Xiangrui Meng <meng@databricks.com> 2014-10-13 09:59:41 -0700
commit: 942847fd94c920f7954ddf01f97263926e512b0e (patch)
tree: b60c81cd8310bda92cb3d7ac8178d22d247f938f /mllib
parent: 92e017fb894be1e8e2b2b5274fec4c31a7a4412e (diff)
download: spark-942847fd94c920f7954ddf01f97263926e512b0e.tar.gz
spark-942847fd94c920f7954ddf01f97263926e512b0e.tar.bz2
spark-942847fd94c920f7954ddf01f97263926e512b0e.zip
1 files changed, 2 insertions, 0 deletions
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/tree/RandomForest.scala b/mllib/src/main/scala/org/apache/spark/mllib/tree/RandomForest.scala
index fa7a26f17c..ebbd8e0257 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/tree/RandomForest.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/tree/RandomForest.scala
@@ -176,6 +176,8 @@ private class RandomForest (
       timer.stop("findBestSplits")
     }
 
+    baggedInput.unpersist()
+
     timer.stop("total")
 
     logInfo("Internal timing for DecisionTree:")
author	omgteam <Kimlong.Liu@gmail.com>	2014-10-13 09:59:41 -0700
committer	Xiangrui Meng <meng@databricks.com>	2014-10-13 09:59:41 -0700
commit	942847fd94c920f7954ddf01f97263926e512b0e (patch)
tree	b60c81cd8310bda92cb3d7ac8178d22d247f938f /mllib
parent	92e017fb894be1e8e2b2b5274fec4c31a7a4412e (diff)
download	spark-942847fd94c920f7954ddf01f97263926e512b0e.tar.gz spark-942847fd94c920f7954ddf01f97263926e512b0e.tar.bz2 spark-942847fd94c920f7954ddf01f97263926e512b0e.zip