[SPARK-7663] [MLLIB] Add requirement for word2vec model

JIRA issue [link](https://issues.apache.org/jira/browse/SPARK-7663). We should check the model size of word2vec, to prevent the unexpected empty. CC srowen. Author: Xusen Yin <yinxusen@gmail.com> Closes #6228 from yinxusen/SPARK-7663 and squashes the following commits: 21770c5 [Xusen Yin] check the vocab size 54ae63e [Xusen Yin] add requirement for word2vec model
author: Xusen Yin <yinxusen@gmail.com> 2015-05-20 10:41:18 +0100
committer: Sean Owen <sowen@cloudera.com> 2015-05-20 10:44:06 +0100
commit: b3abf0b8d9bca13840eb759953d76905c2ba9b8a (patch)
tree: 75b9ad1e525868fb385f2398239b865350ff6089 /mllib
parent: 60336e3bc02a2587fdf315f9011bbe7c9d3a58c4 (diff)
download: spark-b3abf0b8d9bca13840eb759953d76905c2ba9b8a.tar.gz
spark-b3abf0b8d9bca13840eb759953d76905c2ba9b8a.tar.bz2
spark-b3abf0b8d9bca13840eb759953d76905c2ba9b8a.zip
1 files changed, 3 insertions, 0 deletions
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala b/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala
index 731f7576c2..f65f78299d 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala
@@ -158,6 +158,9 @@ class Word2Vec extends Serializable with Logging {
       .sortWith((a, b) => a.cn > b.cn)
     
     vocabSize = vocab.length
+    require(vocabSize > 0, "The vocabulary size should be > 0. You may need to check " +
+      "the setting of minCount, which could be large enough to remove all your words in sentences.")
+
     var a = 0
     while (a < vocabSize) {
       vocabHash += vocab(a).word -> a
author	Xusen Yin <yinxusen@gmail.com>	2015-05-20 10:41:18 +0100
committer	Sean Owen <sowen@cloudera.com>	2015-05-20 10:44:06 +0100
commit	b3abf0b8d9bca13840eb759953d76905c2ba9b8a (patch)
tree	75b9ad1e525868fb385f2398239b865350ff6089 /mllib
parent	60336e3bc02a2587fdf315f9011bbe7c9d3a58c4 (diff)
download	spark-b3abf0b8d9bca13840eb759953d76905c2ba9b8a.tar.gz spark-b3abf0b8d9bca13840eb759953d76905c2ba9b8a.tar.bz2 spark-b3abf0b8d9bca13840eb759953d76905c2ba9b8a.zip