aboutsummaryrefslogtreecommitdiff
path: root/R
diff options
context:
space:
mode:
authorXiangrui Meng <meng@databricks.com>2016-12-13 16:59:09 -0800
committerXiangrui Meng <meng@databricks.com>2016-12-13 16:59:09 -0800
commit594b14f1ebd0b3db9f630e504be92228f11b4d9f (patch)
tree90217129249738bb03b3d824b4da2816f1c0b544 /R
parentc68fb426d4ac05414fb402aa1f30f4c98df103ad (diff)
downloadspark-594b14f1ebd0b3db9f630e504be92228f11b4d9f.tar.gz
spark-594b14f1ebd0b3db9f630e504be92228f11b4d9f.tar.bz2
spark-594b14f1ebd0b3db9f630e504be92228f11b4d9f.zip
[SPARK-18793][SPARK-18794][R] add spark.randomForest/spark.gbt to vignettes
## What changes were proposed in this pull request? Mention `spark.randomForest` and `spark.gbt` in vignettes. Keep the content minimal since users can type `?spark.randomForest` to see the full doc. cc: jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #16264 from mengxr/SPARK-18793.
Diffstat (limited to 'R')
-rw-r--r--R/pkg/vignettes/sparkr-vignettes.Rmd32
1 files changed, 32 insertions, 0 deletions
diff --git a/R/pkg/vignettes/sparkr-vignettes.Rmd b/R/pkg/vignettes/sparkr-vignettes.Rmd
index 625b759626..334daa51f0 100644
--- a/R/pkg/vignettes/sparkr-vignettes.Rmd
+++ b/R/pkg/vignettes/sparkr-vignettes.Rmd
@@ -449,6 +449,10 @@ SparkR supports the following machine learning models and algorithms.
* Generalized Linear Model (GLM)
+* Random Forest
+
+* Gradient-Boosted Trees (GBT)
+
* Naive Bayes Model
* $k$-means Clustering
@@ -526,6 +530,34 @@ gaussianFitted <- predict(gaussianGLM, carsDF)
head(select(gaussianFitted, "model", "prediction", "mpg", "wt", "hp"))
```
+#### Random Forest
+
+`spark.randomForest` fits a [random forest](https://en.wikipedia.org/wiki/Random_forest) classification or regression model on a `SparkDataFrame`.
+Users can call `summary` to get a summary of the fitted model, `predict` to make predictions, and `write.ml`/`read.ml` to save/load fitted models.
+
+In the following example, we use the `longley` dataset to train a random forest and make predictions:
+
+```{r, warning=FALSE}
+df <- createDataFrame(longley)
+rfModel <- spark.randomForest(df, Employed ~ ., type = "regression", maxDepth = 2, numTrees = 2)
+summary(rfModel)
+predictions <- predict(rfModel, df)
+```
+
+#### Gradient-Boosted Trees
+
+`spark.gbt` fits a [gradient-boosted tree](https://en.wikipedia.org/wiki/Gradient_boosting) classification or regression model on a `SparkDataFrame`.
+Users can call `summary` to get a summary of the fitted model, `predict` to make predictions, and `write.ml`/`read.ml` to save/load fitted models.
+
+Similar to the random forest example above, we use the `longley` dataset to train a gradient-boosted tree and make predictions:
+
+```{r, warning=FALSE}
+df <- createDataFrame(longley)
+gbtModel <- spark.gbt(df, Employed ~ ., type = "regression", maxDepth = 2, maxIter = 2)
+summary(gbtModel)
+predictions <- predict(gbtModel, df)
+```
+
#### Naive Bayes Model
Naive Bayes model assumes independence among the features. `spark.naiveBayes` fits a [Bernoulli naive Bayes model](https://en.wikipedia.org/wiki/Naive_Bayes_classifier#Bernoulli_naive_Bayes) against a SparkDataFrame. The data should be all categorical. These models are often used for document classification.