From 26d70bd2b42617ff731b6e9e6d77933b38597ebe Mon Sep 17 00:00:00 2001 From: Yu ISHIKAWA Date: Wed, 16 Dec 2015 10:43:45 -0800 Subject: [SPARK-12215][ML][DOC] User guide section for KMeans in spark.ml cc jkbradley Author: Yu ISHIKAWA Closes #10244 from yu-iskw/SPARK-12215. --- docs/ml-clustering.md | 71 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 71 insertions(+) (limited to 'docs/ml-clustering.md') diff --git a/docs/ml-clustering.md b/docs/ml-clustering.md index a59f7e3005..440c455cd0 100644 --- a/docs/ml-clustering.md +++ b/docs/ml-clustering.md @@ -11,6 +11,77 @@ In this section, we introduce the pipeline API for [clustering in mllib](mllib-c * This will become a table of contents (this text will be scraped). {:toc} +## K-means + +[k-means](http://en.wikipedia.org/wiki/K-means_clustering) is one of the +most commonly used clustering algorithms that clusters the data points into a +predefined number of clusters. The MLlib implementation includes a parallelized +variant of the [k-means++](http://en.wikipedia.org/wiki/K-means%2B%2B) method +called [kmeans||](http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf). + +`KMeans` is implemented as an `Estimator` and generates a `KMeansModel` as the base model. + +### Input Columns + + + + + + + + + + + + + + + + + + +
Param nameType(s)DefaultDescription
featuresColVector"features"Feature vector
+ +### Output Columns + + + + + + + + + + + + + + + + + + +
Param nameType(s)DefaultDescription
predictionColInt"prediction"Predicted cluster center
+ +### Example + +
+ +
+Refer to the [Scala API docs](api/scala/index.html#org.apache.spark.ml.clustering.KMeans) for more details. + +{% include_example scala/org/apache/spark/examples/ml/KMeansExample.scala %} +
+ +
+Refer to the [Java API docs](api/java/org/apache/spark/ml/clustering/KMeans.html) for more details. + +{% include_example java/org/apache/spark/examples/ml/JavaKMeansExample.java %} +
+ +
+ + ## Latent Dirichlet allocation (LDA) `LDA` is implemented as an `Estimator` that supports both `EMLDAOptimizer` and `OnlineLDAOptimizer`, -- cgit v1.2.3