aboutsummaryrefslogtreecommitdiff
path: root/docs/sparkr.md
diff options
context:
space:
mode:
authorYanbo Liang <ybliang8@gmail.com>2016-07-11 14:31:11 -0700
committerShivaram Venkataraman <shivaram@cs.berkeley.edu>2016-07-11 14:31:11 -0700
commit2ad031be67c7a0f0c4895c084c891330a9ec935e (patch)
tree1972b9f3226ca0026db712b6c32faba47f23b2e1 /docs/sparkr.md
parent840853ed06d63694bf98b21a889a960aac6ac0ac (diff)
downloadspark-2ad031be67c7a0f0c4895c084c891330a9ec935e.tar.gz
spark-2ad031be67c7a0f0c4895c084c891330a9ec935e.tar.bz2
spark-2ad031be67c7a0f0c4895c084c891330a9ec935e.zip
[SPARKR][DOC] SparkR ML user guides update for 2.0
## What changes were proposed in this pull request? * Update SparkR ML section to make them consistent with SparkR API docs. * Since #13972 adds labelling support for the ```include_example``` Jekyll plugin, so that we can split the single ```ml.R``` example file into multiple line blocks with different labels, and include them in different algorithms/models in the generated HTML page. ## How was this patch tested? Only docs update, manually check the generated docs. Author: Yanbo Liang <ybliang8@gmail.com> Closes #14011 from yanboliang/r-user-guide-update.
Diffstat (limited to 'docs/sparkr.md')
-rw-r--r--docs/sparkr.md43
1 files changed, 25 insertions, 18 deletions
diff --git a/docs/sparkr.md b/docs/sparkr.md
index 32ef815eb1..b4acb23040 100644
--- a/docs/sparkr.md
+++ b/docs/sparkr.md
@@ -355,32 +355,39 @@ head(teenagers)
# Machine Learning
-SparkR supports the following Machine Learning algorithms.
+SparkR supports the following machine learning algorithms currently: `Generalized Linear Model`, `Accelerated Failure Time (AFT) Survival Regression Model`, `Naive Bayes Model` and `KMeans Model`.
+Under the hood, SparkR uses MLlib to train the model.
+Users can call `summary` to print a summary of the fitted model, [predict](api/R/predict.html) to make predictions on new data, and [write.ml](api/R/write.ml.html)/[read.ml](api/R/read.ml.html) to save/load fitted models.
+SparkR supports a subset of the available R formula operators for model fitting, including ‘~’, ‘.’, ‘:’, ‘+’, and ‘-‘.
-* Generalized Linear Regression Model [spark.glm()](api/R/spark.glm.html)
-* Naive Bayes [spark.naiveBayes()](api/R/spark.naiveBayes.html)
-* KMeans [spark.kmeans()](api/R/spark.kmeans.html)
-* AFT Survival Regression [spark.survreg()](api/R/spark.survreg.html)
+## Algorithms
-[Generalized Linear Regression](api/R/spark.glm.html) can be used to train a model from a specified family. Currently the Gaussian, Binomial, Poisson and Gamma families are supported. We support a subset of the available R formula operators for model fitting, including '~', '.', ':', '+', and '-'.
+### Generalized Linear Model
-The [summary()](api/R/summary.html) function gives the summary of a model produced by different algorithms listed above.
-It produces the similar result compared with R summary function.
+[spark.glm()](api/R/spark.glm.html) or [glm()](api/R/glm.html) fits generalized linear model against a Spark DataFrame.
+Currently "gaussian", "binomial", "poisson" and "gamma" families are supported.
+{% include_example glm r/ml.R %}
-## Model persistence
+### Accelerated Failure Time (AFT) Survival Regression Model
+
+[spark.survreg()](api/R/spark.survreg.html) fits an accelerated failure time (AFT) survival regression model on a SparkDataFrame.
+Note that the formula of [spark.survreg()](api/R/spark.survreg.html) does not support operator '.' currently.
+{% include_example survreg r/ml.R %}
+
+### Naive Bayes Model
-* [write.ml](api/R/write.ml.html) allows users to save a fitted model in a given input path
-* [read.ml](api/R/read.ml.html) allows users to read/load the model which was saved using write.ml in a given path
+[spark.naiveBayes()](api/R/spark.naiveBayes.html) fits a Bernoulli naive Bayes model against a SparkDataFrame. Only categorical data is supported.
+{% include_example naiveBayes r/ml.R %}
-Model persistence is supported for all Machine Learning algorithms for all families.
+### KMeans Model
-The examples below show how to build several models:
-* GLM using the Gaussian and Binomial model families
-* AFT survival regression model
-* Naive Bayes model
-* K-Means model
+[spark.kmeans()](api/R/spark.kmeans.html) fits a k-means clustering model against a Spark DataFrame, similarly to R's kmeans().
+{% include_example kmeans r/ml.R %}
+
+## Model persistence
-{% include_example r/ml.R %}
+The following example shows how to save/load a MLlib model by SparkR.
+{% include_example read_write r/ml.R %}
# R Function Name Conflicts