aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorYuhao Yang <hhbyyh@gmail.com>2015-11-30 14:56:51 -0800
committerXiangrui Meng <meng@databricks.com>2015-11-30 14:56:51 -0800
commite232720a65dfb9ae6135cbb7674e35eddd88d625 (patch)
tree1ae892140c2fce646fff10a34cd64e9ac3d49955 /docs
parenta8ceec5e8c1572dd3d74783c06c78b7ca0b8a7ce (diff)
downloadspark-e232720a65dfb9ae6135cbb7674e35eddd88d625.tar.gz
spark-e232720a65dfb9ae6135cbb7674e35eddd88d625.tar.bz2
spark-e232720a65dfb9ae6135cbb7674e35eddd88d625.zip
[SPARK-11689][ML] Add user guide and example code for LDA under spark.ml
jira: https://issues.apache.org/jira/browse/SPARK-11689 Add simple user guide for LDA under spark.ml and example code under examples/. Use include_example to include example code in the user guide markdown. Check SPARK-11606 for instructions. Original PR is reverted due to document build error. https://github.com/apache/spark/pull/9722 mengxr feynmanliang yinxusen Sorry for the troubling. Author: Yuhao Yang <hhbyyh@gmail.com> Closes #9974 from hhbyyh/ldaMLExample.
Diffstat (limited to 'docs')
-rw-r--r--docs/ml-clustering.md31
-rw-r--r--docs/ml-guide.md3
-rw-r--r--docs/mllib-guide.md1
3 files changed, 34 insertions, 1 deletions
diff --git a/docs/ml-clustering.md b/docs/ml-clustering.md
new file mode 100644
index 0000000000..cfefb5dfbd
--- /dev/null
+++ b/docs/ml-clustering.md
@@ -0,0 +1,31 @@
+---
+layout: global
+title: Clustering - ML
+displayTitle: <a href="ml-guide.html">ML</a> - Clustering
+---
+
+In this section, we introduce the pipeline API for [clustering in mllib](mllib-clustering.html).
+
+## Latent Dirichlet allocation (LDA)
+
+`LDA` is implemented as an `Estimator` that supports both `EMLDAOptimizer` and `OnlineLDAOptimizer`,
+and generates a `LDAModel` as the base models. Expert users may cast a `LDAModel` generated by
+`EMLDAOptimizer` to a `DistributedLDAModel` if needed.
+
+<div class="codetabs">
+
+<div data-lang="scala" markdown="1">
+
+Refer to the [Scala API docs](api/scala/index.html#org.apache.spark.ml.clustering.LDA) for more details.
+
+{% include_example scala/org/apache/spark/examples/ml/LDAExample.scala %}
+</div>
+
+<div data-lang="java" markdown="1">
+
+Refer to the [Java API docs](api/java/org/apache/spark/ml/clustering/LDA.html) for more details.
+
+{% include_example java/org/apache/spark/examples/ml/JavaLDAExample.java %}
+</div>
+
+</div> \ No newline at end of file
diff --git a/docs/ml-guide.md b/docs/ml-guide.md
index be18a05361..6f35b30c3d 100644
--- a/docs/ml-guide.md
+++ b/docs/ml-guide.md
@@ -40,6 +40,7 @@ Also, some algorithms have additional capabilities in the `spark.ml` API; e.g.,
provide class probabilities, and linear models provide model summaries.
* [Feature extraction, transformation, and selection](ml-features.html)
+* [Clustering](ml-clustering.html)
* [Decision Trees for classification and regression](ml-decision-tree.html)
* [Ensembles](ml-ensembles.html)
* [Linear methods with elastic net regularization](ml-linear-methods.html)
@@ -950,4 +951,4 @@ model.transform(test)
{% endhighlight %}
</div>
-</div>
+</div> \ No newline at end of file
diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md
index 91e50ccfec..54e35fcbb1 100644
--- a/docs/mllib-guide.md
+++ b/docs/mllib-guide.md
@@ -69,6 +69,7 @@ We list major functionality from both below, with links to detailed guides.
concepts. It also contains sections on using algorithms within the Pipelines API, for example:
* [Feature extraction, transformation, and selection](ml-features.html)
+* [Clustering](ml-clustering.html)
* [Decision trees for classification and regression](ml-decision-tree.html)
* [Ensembles](ml-ensembles.html)
* [Linear methods with elastic net regularization](ml-linear-methods.html)