diff options
author | sethah <seth.hendrickson16@gmail.com> | 2016-05-19 23:29:37 -0700 |
---|---|---|
committer | Xiangrui Meng <meng@databricks.com> | 2016-05-19 23:29:37 -0700 |
commit | 5e203505f1a092e5849ebd01d9ff9e4fc6cdc34a (patch) | |
tree | 8b59b210bfaadfa89d922fe98ea93c0687c8da07 /docs/ml-guide.md | |
parent | 47a2940da97caa55bbb8bb8ec1d51c9f6d5041c6 (diff) | |
download | spark-5e203505f1a092e5849ebd01d9ff9e4fc6cdc34a.tar.gz spark-5e203505f1a092e5849ebd01d9ff9e4fc6cdc34a.tar.bz2 spark-5e203505f1a092e5849ebd01d9ff9e4fc6cdc34a.zip |
[SPARK-15394][ML][DOCS] User guide typos and grammar audit
## What changes were proposed in this pull request?
Correct some typos and incorrectly worded sentences.
## How was this patch tested?
Doc changes only.
Note that many of these changes were identified by whomfire01
Author: sethah <seth.hendrickson16@gmail.com>
Closes #13180 from sethah/ml_guide_audit.
Diffstat (limited to 'docs/ml-guide.md')
-rw-r--r-- | docs/ml-guide.md | 8 |
1 files changed, 4 insertions, 4 deletions
diff --git a/docs/ml-guide.md b/docs/ml-guide.md index cc353df1ec..dae86d8480 100644 --- a/docs/ml-guide.md +++ b/docs/ml-guide.md @@ -47,7 +47,7 @@ mostly inspired by the [scikit-learn](http://scikit-learn.org/) project. E.g., a `DataFrame` could have different columns storing text, feature vectors, true labels, and predictions. * **[`Transformer`](ml-guide.html#transformers)**: A `Transformer` is an algorithm which can transform one `DataFrame` into another `DataFrame`. -E.g., an ML model is a `Transformer` which transforms `DataFrame` with features into a `DataFrame` with predictions. +E.g., an ML model is a `Transformer` which transforms a `DataFrame` with features into a `DataFrame` with predictions. * **[`Estimator`](ml-guide.html#estimators)**: An `Estimator` is an algorithm which can be fit on a `DataFrame` to produce a `Transformer`. E.g., a learning algorithm is an `Estimator` which trains on a `DataFrame` and produces a model. @@ -292,13 +292,13 @@ However, it is also a well-established method for choosing parameters which is m ## Example: model selection via train validation split In addition to `CrossValidator` Spark also offers `TrainValidationSplit` for hyper-parameter tuning. -`TrainValidationSplit` only evaluates each combination of parameters once as opposed to k times in - case of `CrossValidator`. It is therefore less expensive, +`TrainValidationSplit` only evaluates each combination of parameters once, as opposed to k times in + the case of `CrossValidator`. It is therefore less expensive, but will not produce as reliable results when the training dataset is not sufficiently large. `TrainValidationSplit` takes an `Estimator`, a set of `ParamMap`s provided in the `estimatorParamMaps` parameter, and an `Evaluator`. -It begins by splitting the dataset into two parts using `trainRatio` parameter +It begins by splitting the dataset into two parts using the `trainRatio` parameter which are used as separate training and test datasets. For example with `$trainRatio=0.75$` (default), `TrainValidationSplit` will generate a training and test dataset pair where 75% of the data is used for training and 25% for validation. Similar to `CrossValidator`, `TrainValidationSplit` also iterates through the set of `ParamMap`s. |