From 1d703660d4d14caea697affdf31170aea44c8903 Mon Sep 17 00:00:00 2001 From: Yuhao Yang Date: Tue, 12 May 2015 15:12:29 -0700 Subject: [SPARK-7496] [MLLIB] Update Programming guide with Online LDA jira: https://issues.apache.org/jira/browse/SPARK-7496 Update LDA subsection of clustering section of MLlib programming guide to include OnlineLDA. Author: Yuhao Yang Closes #6046 from hhbyyh/ldaDocument and squashes the following commits: 4b6fbfa [Yuhao Yang] add online paper and some comparison fd4c983 [Yuhao Yang] update lda document for optimizers --- docs/mllib-clustering.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'docs/mllib-clustering.md') diff --git a/docs/mllib-clustering.md b/docs/mllib-clustering.md index f5aa15b7d9..f41ca70952 100644 --- a/docs/mllib-clustering.md +++ b/docs/mllib-clustering.md @@ -377,11 +377,11 @@ LDA can be thought of as a clustering algorithm as follows: on a statistical model of how text documents are generated. LDA takes in a collection of documents as vectors of word counts. -It learns clustering using [expectation-maximization](http://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm) -on the likelihood function. After fitting on the documents, LDA provides: +It supports different inference algorithms via `setOptimizer` function. EMLDAOptimizer learns clustering using [expectation-maximization](http://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm) +on the likelihood function and yields comprehensive results, while OnlineLDAOptimizer uses iterative mini-batch sampling for [online variational inference](https://www.cs.princeton.edu/~blei/papers/HoffmanBleiBach2010b.pdf) and is generally memory friendly. After fitting on the documents, LDA provides: * Topics: Inferred topics, each of which is a probability distribution over terms (words). -* Topic distributions for documents: For each document in the training set, LDA gives a probability distribution over topics. +* Topic distributions for documents: For each document in the training set, LDA gives a probability distribution over topics. (EM only) LDA takes the following parameters: -- cgit v1.2.3