aboutsummaryrefslogtreecommitdiff
path: root/mllib
diff options
context:
space:
mode:
authorSean Owen <sowen@cloudera.com>2016-07-13 11:39:32 +0100
committerSean Owen <sowen@cloudera.com>2016-07-13 11:39:32 +0100
commit51ade51a9fd64fc2fe651c505a286e6f29f59d40 (patch)
treec0cec499c429368efb6e1f58f3426e93722633ab /mllib
parent3d6f679cfe5945a9f72841727342af39e9410e0a (diff)
downloadspark-51ade51a9fd64fc2fe651c505a286e6f29f59d40.tar.gz
spark-51ade51a9fd64fc2fe651c505a286e6f29f59d40.tar.bz2
spark-51ade51a9fd64fc2fe651c505a286e6f29f59d40.zip
[SPARK-16440][MLLIB] Undeleted broadcast variables in Word2Vec causing OoM for long runs
## What changes were proposed in this pull request? Unpersist broadcasted vars in Word2Vec.fit for more timely / reliable resource cleanup ## How was this patch tested? Jenkins tests Author: Sean Owen <sowen@cloudera.com> Closes #14153 from srowen/SPARK-16440.
Diffstat (limited to 'mllib')
-rw-r--r--mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala3
1 files changed, 3 insertions, 0 deletions
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala b/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala
index f2211df3f9..6b9c8ee2e3 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala
@@ -434,6 +434,9 @@ class Word2Vec extends Serializable with Logging {
bcSyn1Global.unpersist(false)
}
newSentences.unpersist()
+ expTable.unpersist()
+ bcVocab.unpersist()
+ bcVocabHash.unpersist()
val wordArray = vocab.map(_.word)
new Word2VecModel(wordArray.zipWithIndex.toMap, syn0Global)