aboutsummaryrefslogtreecommitdiff
path: root/R/pkg/vignettes/sparkr-vignettes.Rmd
diff options
context:
space:
mode:
Diffstat (limited to 'R/pkg/vignettes/sparkr-vignettes.Rmd')
-rw-r--r--R/pkg/vignettes/sparkr-vignettes.Rmd14
1 files changed, 14 insertions, 0 deletions
diff --git a/R/pkg/vignettes/sparkr-vignettes.Rmd b/R/pkg/vignettes/sparkr-vignettes.Rmd
index 36a78477dc..a7cac2f503 100644
--- a/R/pkg/vignettes/sparkr-vignettes.Rmd
+++ b/R/pkg/vignettes/sparkr-vignettes.Rmd
@@ -488,6 +488,8 @@ SparkR supports the following machine learning models and algorithms.
#### Clustering
+* Bisecting $k$-means
+
* Gaussian Mixture Model (GMM)
* $k$-means Clustering
@@ -738,6 +740,18 @@ summary(rfModel)
predictions <- predict(rfModel, df)
```
+#### Bisecting k-Means
+
+`spark.bisectingKmeans` is a kind of [hierarchical clustering](https://en.wikipedia.org/wiki/Hierarchical_clustering) using a divisive (or "top-down") approach: all observations start in one cluster, and splits are performed recursively as one moves down the hierarchy.
+
+```{r}
+df <- createDataFrame(iris)
+model <- spark.bisectingKmeans(df, Sepal_Length ~ Sepal_Width, k = 4)
+summary(kmeansModel)
+fitted <- predict(model, df)
+head(select(fitted, "Sepal_Length", "prediction"))
+```
+
#### Gaussian Mixture Model
`spark.gaussianMixture` fits multivariate [Gaussian Mixture Model](https://en.wikipedia.org/wiki/Mixture_model#Multivariate_Gaussian_mixture_model) (GMM) against a `SparkDataFrame`. [Expectation-Maximization](https://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm) (EM) is used to approximate the maximum likelihood estimator (MLE) of the model.