diff options
author | wm624@hotmail.com <wm624@hotmail.com> | 2016-05-17 15:20:47 +0200 |
---|---|---|
committer | Nick Pentreath <nickp@za.ibm.com> | 2016-05-17 15:20:47 +0200 |
commit | 4134ff0c657efcbf0f61eff0423215afd6132837 (patch) | |
tree | 6b26b8824aff57e216974574b30dbfd2d20d3d81 /docs | |
parent | c36ca651f9177f8e7a3f6a0098cba5a810ee9deb (diff) | |
download | spark-4134ff0c657efcbf0f61eff0423215afd6132837.tar.gz spark-4134ff0c657efcbf0f61eff0423215afd6132837.tar.bz2 spark-4134ff0c657efcbf0f61eff0423215afd6132837.zip |
[SPARK-14434][ML] User guide doc and examples for GaussianMixture in spark.ml
## What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
Add guide doc and examples for GaussianMixture in Spark.ml in Java, Scala and Python.
## How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
Manual compile and test all examples
Author: wm624@hotmail.com <wm624@hotmail.com>
Closes #12788 from wangmiao1981/example.
Diffstat (limited to 'docs')
-rw-r--r-- | docs/ml-clustering.md | 82 |
1 files changed, 82 insertions, 0 deletions
diff --git a/docs/ml-clustering.md b/docs/ml-clustering.md index a0955a3855..33e4b7b0d2 100644 --- a/docs/ml-clustering.md +++ b/docs/ml-clustering.md @@ -148,3 +148,85 @@ Refer to the [Python API docs](api/python/pyspark.ml.html#pyspark.ml.clustering. {% include_example python/ml/bisecting_k_means_example.py %} </div> </div> + +## Gaussian Mixture Model (GMM) + +A [Gaussian Mixture Model](http://en.wikipedia.org/wiki/Mixture_model#Multivariate_Gaussian_mixture_model) +represents a composite distribution whereby points are drawn from one of *k* Gaussian sub-distributions, +each with its own probability. The `spark.ml` implementation uses the +[expectation-maximization](http://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm) +algorithm to induce the maximum-likelihood model given a set of samples. + +`GaussianMixture` is implemented as an `Estimator` and generates a `GaussianMixtureModel` as the base +model. + +### Input Columns + +<table class="table"> + <thead> + <tr> + <th align="left">Param name</th> + <th align="left">Type(s)</th> + <th align="left">Default</th> + <th align="left">Description</th> + </tr> + </thead> + <tbody> + <tr> + <td>featuresCol</td> + <td>Vector</td> + <td>"features"</td> + <td>Feature vector</td> + </tr> + </tbody> +</table> + +### Output Columns + +<table class="table"> + <thead> + <tr> + <th align="left">Param name</th> + <th align="left">Type(s)</th> + <th align="left">Default</th> + <th align="left">Description</th> + </tr> + </thead> + <tbody> + <tr> + <td>predictionCol</td> + <td>Int</td> + <td>"prediction"</td> + <td>Predicted cluster center</td> + </tr> + <tr> + <td>probabilityCol</td> + <td>Vector</td> + <td>"probability"</td> + <td>Probability of each cluster</td> + </tr> + </tbody> +</table> + +### Example + +<div class="codetabs"> + +<div data-lang="scala" markdown="1"> +Refer to the [Scala API docs](api/scala/index.html#org.apache.spark.ml.clustering.GaussianMixture) for more details. + +{% include_example scala/org/apache/spark/examples/ml/GaussianMixtureExample.scala %} +</div> + +<div data-lang="java" markdown="1"> +Refer to the [Java API docs](api/java/org/apache/spark/ml/clustering/GaussianMixture.html) for more details. + +{% include_example java/org/apache/spark/examples/ml/JavaGaussianMixtureExample.java %} +</div> + +<div data-lang="python" markdown="1"> +Refer to the [Python API docs](api/python/pyspark.ml.html#pyspark.ml.clustering.GaussianMixture) for more details. + +{% include_example python/ml/gaussian_mixture_example.py %} +</div> +</div> |