aboutsummaryrefslogtreecommitdiff
path: root/docs/mllib-clustering.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/mllib-clustering.md')
-rw-r--r--docs/mllib-clustering.md4
1 files changed, 4 insertions, 0 deletions
diff --git a/docs/mllib-clustering.md b/docs/mllib-clustering.md
index 3fb35d3c50..c2711cf82d 100644
--- a/docs/mllib-clustering.md
+++ b/docs/mllib-clustering.md
@@ -507,6 +507,10 @@ must also be $> 1.0$. Providing `Vector(-1)` results in default behavior
$> 1.0$. Providing `-1` results in defaulting to a value of $0.1 + 1$.
* `maxIterations`: The maximum number of EM iterations.
+*Note*: It is important to do enough iterations. In early iterations, EM often has useless topics,
+but those topics improve dramatically after more iterations. Using at least 20 and possibly
+50-100 iterations is often reasonable, depending on your dataset.
+
`EMLDAOptimizer` produces a `DistributedLDAModel`, which stores not only
the inferred topics but also the full training corpus and topic
distributions for each document in the training corpus. A