From 6718c1eb671faaf5c1d865ad5d01dbf78dae9cd2 Mon Sep 17 00:00:00 2001 From: Alok Singh Date: Mon, 6 Jul 2015 21:53:55 -0700 Subject: [SPARK-5562] [MLLIB] LDA should handle empty document. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit See the jira https://issues.apache.org/jira/browse/SPARK-5562 Author: Alok Singh Author: Alok Singh Author: Alok Singh <“singhal@us.ibm.com”> Closes #7064 from aloknsingh/aloknsingh_SPARK-5562 and squashes the following commits: 259a0a7 [Alok Singh] change as per the comments by @jkbradley be48491 [Alok Singh] [SPARK-5562][MLlib] re-order import in alphabhetical order c01311b [Alok Singh] [SPARK-5562][MLlib] fix the newline typo b271c8a [Alok Singh] [SPARK-5562][Mllib] As per github discussion with jkbradley. We would like to simply things. 7c06251 [Alok Singh] [SPARK-5562][MLlib] modified the JavaLDASuite for test passing c710cb6 [Alok Singh] fix the scala code style to have space after : 2572a08 [Alok Singh] [SPARK-5562][MLlib] change the import xyz._ to the import xyz.{c1, c2} .. ab55fbf [Alok Singh] [SPARK-5562][MLlib] Change as per Sean Owen's comments https://github.com/apache/spark/pull/7064/files#diff-9236d23975e6f5a5608ffc81dfd79146 9f4f9ea [Alok Singh] [SPARK-5562][MLlib] LDA should handle empty document. --- docs/mllib-clustering.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'docs/mllib-clustering.md') diff --git a/docs/mllib-clustering.md b/docs/mllib-clustering.md index 3aad4149f9..d72dc20a5a 100644 --- a/docs/mllib-clustering.md +++ b/docs/mllib-clustering.md @@ -447,7 +447,7 @@ It supports different inference algorithms via `setOptimizer` function. EMLDAOpt on the likelihood function and yields comprehensive results, while OnlineLDAOptimizer uses iterative mini-batch sampling for [online variational inference](https://www.cs.princeton.edu/~blei/papers/HoffmanBleiBach2010b.pdf) and is generally memory friendly. After fitting on the documents, LDA provides: * Topics: Inferred topics, each of which is a probability distribution over terms (words). -* Topic distributions for documents: For each document in the training set, LDA gives a probability distribution over topics. (EM only) +* Topic distributions for documents: For each non empty document in the training set, LDA gives a probability distribution over topics. (EM only). Note that for empty documents, we don't create the topic distributions. (EM only) LDA takes the following parameters: -- cgit v1.2.3