aboutsummaryrefslogtreecommitdiff
path: root/docs/ml-clustering.md
diff options
context:
space:
mode:
authorwm624@hotmail.com <wm624@hotmail.com>2016-05-17 15:20:47 +0200
committerNick Pentreath <nickp@za.ibm.com>2016-05-17 15:20:47 +0200
commit4134ff0c657efcbf0f61eff0423215afd6132837 (patch)
tree6b26b8824aff57e216974574b30dbfd2d20d3d81 /docs/ml-clustering.md
parentc36ca651f9177f8e7a3f6a0098cba5a810ee9deb (diff)
downloadspark-4134ff0c657efcbf0f61eff0423215afd6132837.tar.gz
spark-4134ff0c657efcbf0f61eff0423215afd6132837.tar.bz2
spark-4134ff0c657efcbf0f61eff0423215afd6132837.zip
[SPARK-14434][ML] User guide doc and examples for GaussianMixture in spark.ml
## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) Add guide doc and examples for GaussianMixture in Spark.ml in Java, Scala and Python. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) Manual compile and test all examples Author: wm624@hotmail.com <wm624@hotmail.com> Closes #12788 from wangmiao1981/example.
Diffstat (limited to 'docs/ml-clustering.md')
-rw-r--r--docs/ml-clustering.md82
1 files changed, 82 insertions, 0 deletions
diff --git a/docs/ml-clustering.md b/docs/ml-clustering.md
index a0955a3855..33e4b7b0d2 100644
--- a/docs/ml-clustering.md
+++ b/docs/ml-clustering.md
@@ -148,3 +148,85 @@ Refer to the [Python API docs](api/python/pyspark.ml.html#pyspark.ml.clustering.
{% include_example python/ml/bisecting_k_means_example.py %}
</div>
</div>
+
+## Gaussian Mixture Model (GMM)
+
+A [Gaussian Mixture Model](http://en.wikipedia.org/wiki/Mixture_model#Multivariate_Gaussian_mixture_model)
+represents a composite distribution whereby points are drawn from one of *k* Gaussian sub-distributions,
+each with its own probability. The `spark.ml` implementation uses the
+[expectation-maximization](http://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm)
+algorithm to induce the maximum-likelihood model given a set of samples.
+
+`GaussianMixture` is implemented as an `Estimator` and generates a `GaussianMixtureModel` as the base
+model.
+
+### Input Columns
+
+<table class="table">
+ <thead>
+ <tr>
+ <th align="left">Param name</th>
+ <th align="left">Type(s)</th>
+ <th align="left">Default</th>
+ <th align="left">Description</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>featuresCol</td>
+ <td>Vector</td>
+ <td>"features"</td>
+ <td>Feature vector</td>
+ </tr>
+ </tbody>
+</table>
+
+### Output Columns
+
+<table class="table">
+ <thead>
+ <tr>
+ <th align="left">Param name</th>
+ <th align="left">Type(s)</th>
+ <th align="left">Default</th>
+ <th align="left">Description</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>predictionCol</td>
+ <td>Int</td>
+ <td>"prediction"</td>
+ <td>Predicted cluster center</td>
+ </tr>
+ <tr>
+ <td>probabilityCol</td>
+ <td>Vector</td>
+ <td>"probability"</td>
+ <td>Probability of each cluster</td>
+ </tr>
+ </tbody>
+</table>
+
+### Example
+
+<div class="codetabs">
+
+<div data-lang="scala" markdown="1">
+Refer to the [Scala API docs](api/scala/index.html#org.apache.spark.ml.clustering.GaussianMixture) for more details.
+
+{% include_example scala/org/apache/spark/examples/ml/GaussianMixtureExample.scala %}
+</div>
+
+<div data-lang="java" markdown="1">
+Refer to the [Java API docs](api/java/org/apache/spark/ml/clustering/GaussianMixture.html) for more details.
+
+{% include_example java/org/apache/spark/examples/ml/JavaGaussianMixtureExample.java %}
+</div>
+
+<div data-lang="python" markdown="1">
+Refer to the [Python API docs](api/python/pyspark.ml.html#pyspark.ml.clustering.GaussianMixture) for more details.
+
+{% include_example python/ml/gaussian_mixture_example.py %}
+</div>
+</div>