aboutsummaryrefslogtreecommitdiff
path: root/docs/ml-guide.md
diff options
context:
space:
mode:
authorsethah <seth.hendrickson16@gmail.com>2016-05-19 23:29:37 -0700
committerXiangrui Meng <meng@databricks.com>2016-05-19 23:29:37 -0700
commit5e203505f1a092e5849ebd01d9ff9e4fc6cdc34a (patch)
tree8b59b210bfaadfa89d922fe98ea93c0687c8da07 /docs/ml-guide.md
parent47a2940da97caa55bbb8bb8ec1d51c9f6d5041c6 (diff)
downloadspark-5e203505f1a092e5849ebd01d9ff9e4fc6cdc34a.tar.gz
spark-5e203505f1a092e5849ebd01d9ff9e4fc6cdc34a.tar.bz2
spark-5e203505f1a092e5849ebd01d9ff9e4fc6cdc34a.zip
[SPARK-15394][ML][DOCS] User guide typos and grammar audit
## What changes were proposed in this pull request? Correct some typos and incorrectly worded sentences. ## How was this patch tested? Doc changes only. Note that many of these changes were identified by whomfire01 Author: sethah <seth.hendrickson16@gmail.com> Closes #13180 from sethah/ml_guide_audit.
Diffstat (limited to 'docs/ml-guide.md')
-rw-r--r--docs/ml-guide.md8
1 files changed, 4 insertions, 4 deletions
diff --git a/docs/ml-guide.md b/docs/ml-guide.md
index cc353df1ec..dae86d8480 100644
--- a/docs/ml-guide.md
+++ b/docs/ml-guide.md
@@ -47,7 +47,7 @@ mostly inspired by the [scikit-learn](http://scikit-learn.org/) project.
E.g., a `DataFrame` could have different columns storing text, feature vectors, true labels, and predictions.
* **[`Transformer`](ml-guide.html#transformers)**: A `Transformer` is an algorithm which can transform one `DataFrame` into another `DataFrame`.
-E.g., an ML model is a `Transformer` which transforms `DataFrame` with features into a `DataFrame` with predictions.
+E.g., an ML model is a `Transformer` which transforms a `DataFrame` with features into a `DataFrame` with predictions.
* **[`Estimator`](ml-guide.html#estimators)**: An `Estimator` is an algorithm which can be fit on a `DataFrame` to produce a `Transformer`.
E.g., a learning algorithm is an `Estimator` which trains on a `DataFrame` and produces a model.
@@ -292,13 +292,13 @@ However, it is also a well-established method for choosing parameters which is m
## Example: model selection via train validation split
In addition to `CrossValidator` Spark also offers `TrainValidationSplit` for hyper-parameter tuning.
-`TrainValidationSplit` only evaluates each combination of parameters once as opposed to k times in
- case of `CrossValidator`. It is therefore less expensive,
+`TrainValidationSplit` only evaluates each combination of parameters once, as opposed to k times in
+ the case of `CrossValidator`. It is therefore less expensive,
but will not produce as reliable results when the training dataset is not sufficiently large.
`TrainValidationSplit` takes an `Estimator`, a set of `ParamMap`s provided in the `estimatorParamMaps` parameter,
and an `Evaluator`.
-It begins by splitting the dataset into two parts using `trainRatio` parameter
+It begins by splitting the dataset into two parts using the `trainRatio` parameter
which are used as separate training and test datasets. For example with `$trainRatio=0.75$` (default),
`TrainValidationSplit` will generate a training and test dataset pair where 75% of the data is used for training and 25% for validation.
Similar to `CrossValidator`, `TrainValidationSplit` also iterates through the set of `ParamMap`s.