aboutsummaryrefslogtreecommitdiff
path: root/docs/mllib-ensembles.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/mllib-ensembles.md')
-rw-r--r--docs/mllib-ensembles.md11
1 files changed, 11 insertions, 0 deletions
diff --git a/docs/mllib-ensembles.md b/docs/mllib-ensembles.md
index fb90b70399..00040e6073 100644
--- a/docs/mllib-ensembles.md
+++ b/docs/mllib-ensembles.md
@@ -427,6 +427,17 @@ We omit some decision tree parameters since those are covered in the [decision t
* **`algo`**: The algorithm or task (classification vs. regression) is set using the tree [Strategy] parameter.
+#### Validation while training
+
+Gradient boosting can overfit when trained with more trees. In order to prevent overfitting, it is useful to validate while
+training. The method runWithValidation has been provided to make use of this option. It takes a pair of RDD's as arguments, the
+first one being the training dataset and the second being the validation dataset.
+
+The training is stopped when the improvement in the validation error is not more than a certain tolerance
+(supplied by the `validationTol` argument in `BoostingStrategy`). In practice, the validation error
+decreases initially and later increases. There might be cases in which the validation error does not change monotonically,
+and the user is advised to set a large enough negative tolerance and examine the validation curve to to tune the number of
+iterations.
### Examples