aboutsummaryrefslogtreecommitdiff
path: root/mllib
diff options
context:
space:
mode:
authorJeff Zhang <zjffdu@gmail.com>2016-04-29 10:42:52 -0700
committerJoseph K. Bradley <joseph@databricks.com>2016-04-29 10:42:52 -0700
commit775772de36d5b7e80595aad850aa1dcea8791688 (patch)
tree2b2f67da23565dd7c1ac2d6758bfe502c2c76cd5 /mllib
parentf08dcdb8d33d2a40573547ae8543e409b6ab9e59 (diff)
downloadspark-775772de36d5b7e80595aad850aa1dcea8791688.tar.gz
spark-775772de36d5b7e80595aad850aa1dcea8791688.tar.bz2
spark-775772de36d5b7e80595aad850aa1dcea8791688.zip
[SPARK-11940][PYSPARK][ML] Python API for ml.clustering.LDA PR2
## What changes were proposed in this pull request? pyspark.ml API for LDA * LDA, LDAModel, LocalLDAModel, DistributedLDAModel * includes persistence This replaces [https://github.com/apache/spark/pull/10242] ## How was this patch tested? * doc test for LDA, including Param setters * unit test for persistence Author: Joseph K. Bradley <joseph@databricks.com> Author: Jeff Zhang <zjffdu@apache.org> Closes #12723 from jkbradley/zjffdu-SPARK-11940.
Diffstat (limited to 'mllib')
-rw-r--r--mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala7
1 files changed, 3 insertions, 4 deletions
diff --git a/mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala b/mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala
index 1554d568af..38ecc5a102 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala
@@ -355,7 +355,7 @@ private[clustering] trait LDAParams extends Params with HasFeaturesCol with HasM
* :: Experimental ::
* Model fitted by [[LDA]].
*
- * @param vocabSize Vocabulary size (number of terms or terms in the vocabulary)
+ * @param vocabSize Vocabulary size (number of terms or words in the vocabulary)
* @param sparkSession Used to construct local DataFrames for returning query results
*/
@Since("1.6.0")
@@ -745,9 +745,8 @@ object DistributedLDAModel extends MLReadable[DistributedLDAModel] {
* - "topic": multinomial distribution over terms representing some concept
* - "document": one piece of text, corresponding to one row in the input data
*
- * References:
- * - Original LDA paper (journal version):
- * Blei, Ng, and Jordan. "Latent Dirichlet Allocation." JMLR, 2003.
+ * Original LDA paper (journal version):
+ * Blei, Ng, and Jordan. "Latent Dirichlet Allocation." JMLR, 2003.
*
* Input data (featuresCol):
* LDA is given a collection of documents as input data, via the featuresCol parameter.