diff options
author | Xiangrui Meng <meng@databricks.com> | 2016-12-13 16:59:09 -0800 |
---|---|---|
committer | Xiangrui Meng <meng@databricks.com> | 2016-12-13 16:59:09 -0800 |
commit | 594b14f1ebd0b3db9f630e504be92228f11b4d9f (patch) | |
tree | 90217129249738bb03b3d824b4da2816f1c0b544 /R | |
parent | c68fb426d4ac05414fb402aa1f30f4c98df103ad (diff) | |
download | spark-594b14f1ebd0b3db9f630e504be92228f11b4d9f.tar.gz spark-594b14f1ebd0b3db9f630e504be92228f11b4d9f.tar.bz2 spark-594b14f1ebd0b3db9f630e504be92228f11b4d9f.zip |
[SPARK-18793][SPARK-18794][R] add spark.randomForest/spark.gbt to vignettes
## What changes were proposed in this pull request?
Mention `spark.randomForest` and `spark.gbt` in vignettes. Keep the content minimal since users can type `?spark.randomForest` to see the full doc.
cc: jkbradley
Author: Xiangrui Meng <meng@databricks.com>
Closes #16264 from mengxr/SPARK-18793.
Diffstat (limited to 'R')
-rw-r--r-- | R/pkg/vignettes/sparkr-vignettes.Rmd | 32 |
1 files changed, 32 insertions, 0 deletions
diff --git a/R/pkg/vignettes/sparkr-vignettes.Rmd b/R/pkg/vignettes/sparkr-vignettes.Rmd index 625b759626..334daa51f0 100644 --- a/R/pkg/vignettes/sparkr-vignettes.Rmd +++ b/R/pkg/vignettes/sparkr-vignettes.Rmd @@ -449,6 +449,10 @@ SparkR supports the following machine learning models and algorithms. * Generalized Linear Model (GLM) +* Random Forest + +* Gradient-Boosted Trees (GBT) + * Naive Bayes Model * $k$-means Clustering @@ -526,6 +530,34 @@ gaussianFitted <- predict(gaussianGLM, carsDF) head(select(gaussianFitted, "model", "prediction", "mpg", "wt", "hp")) ``` +#### Random Forest + +`spark.randomForest` fits a [random forest](https://en.wikipedia.org/wiki/Random_forest) classification or regression model on a `SparkDataFrame`. +Users can call `summary` to get a summary of the fitted model, `predict` to make predictions, and `write.ml`/`read.ml` to save/load fitted models. + +In the following example, we use the `longley` dataset to train a random forest and make predictions: + +```{r, warning=FALSE} +df <- createDataFrame(longley) +rfModel <- spark.randomForest(df, Employed ~ ., type = "regression", maxDepth = 2, numTrees = 2) +summary(rfModel) +predictions <- predict(rfModel, df) +``` + +#### Gradient-Boosted Trees + +`spark.gbt` fits a [gradient-boosted tree](https://en.wikipedia.org/wiki/Gradient_boosting) classification or regression model on a `SparkDataFrame`. +Users can call `summary` to get a summary of the fitted model, `predict` to make predictions, and `write.ml`/`read.ml` to save/load fitted models. + +Similar to the random forest example above, we use the `longley` dataset to train a gradient-boosted tree and make predictions: + +```{r, warning=FALSE} +df <- createDataFrame(longley) +gbtModel <- spark.gbt(df, Employed ~ ., type = "regression", maxDepth = 2, maxIter = 2) +summary(gbtModel) +predictions <- predict(gbtModel, df) +``` + #### Naive Bayes Model Naive Bayes model assumes independence among the features. `spark.naiveBayes` fits a [Bernoulli naive Bayes model](https://en.wikipedia.org/wiki/Naive_Bayes_classifier#Bernoulli_naive_Bayes) against a SparkDataFrame. The data should be all categorical. These models are often used for document classification. |