aboutsummaryrefslogtreecommitdiff
path: root/docs/ml-guide.md
diff options
context:
space:
mode:
authorAlexis Seigneurin <alexis.seigneurin@gmail.com>2015-09-19 12:01:22 +0100
committerSean Owen <sowen@cloudera.com>2015-09-19 12:01:22 +0100
commitd83b6aae8b4357c56779cc98804eb350ab8af62d (patch)
treebedbf25ea07ea27b9bae98cf7ac932fa99cb57be /docs/ml-guide.md
parentd507f9c0b7f7a524137a694ed6443747aaf90463 (diff)
downloadspark-d83b6aae8b4357c56779cc98804eb350ab8af62d.tar.gz
spark-d83b6aae8b4357c56779cc98804eb350ab8af62d.tar.bz2
spark-d83b6aae8b4357c56779cc98804eb350ab8af62d.zip
Fixed links to the API
Submitting this change on the master branch as requested in https://github.com/apache/spark/pull/8819#issuecomment-141505941 Author: Alexis Seigneurin <alexis.seigneurin@gmail.com> Closes #8838 from aseigneurin/patch-2.
Diffstat (limited to 'docs/ml-guide.md')
-rw-r--r--docs/ml-guide.md8
1 files changed, 4 insertions, 4 deletions
diff --git a/docs/ml-guide.md b/docs/ml-guide.md
index c5d7f99002..0427ac6695 100644
--- a/docs/ml-guide.md
+++ b/docs/ml-guide.md
@@ -619,13 +619,13 @@ for row in selected.collect():
An important task in ML is *model selection*, or using data to find the best model or parameters for a given task. This is also called *tuning*.
`Pipeline`s facilitate model selection by making it easy to tune an entire `Pipeline` at once, rather than tuning each element in the `Pipeline` separately.
-Currently, `spark.ml` supports model selection using the [`CrossValidator`](api/scala/index.html#org.apache.spark.ml.tuning.CrossValidator) class, which takes an `Estimator`, a set of `ParamMap`s, and an [`Evaluator`](api/scala/index.html#org.apache.spark.ml.Evaluator).
+Currently, `spark.ml` supports model selection using the [`CrossValidator`](api/scala/index.html#org.apache.spark.ml.tuning.CrossValidator) class, which takes an `Estimator`, a set of `ParamMap`s, and an [`Evaluator`](api/scala/index.html#org.apache.spark.ml.evaluation.Evaluator).
`CrossValidator` begins by splitting the dataset into a set of *folds* which are used as separate training and test datasets; e.g., with `$k=3$` folds, `CrossValidator` will generate 3 (training, test) dataset pairs, each of which uses 2/3 of the data for training and 1/3 for testing.
`CrossValidator` iterates through the set of `ParamMap`s. For each `ParamMap`, it trains the given `Estimator` and evaluates it using the given `Evaluator`.
-The `Evaluator` can be a [`RegressionEvaluator`](api/scala/index.html#org.apache.spark.ml.RegressionEvaluator)
-for regression problems, a [`BinaryClassificationEvaluator`](api/scala/index.html#org.apache.spark.ml.BinaryClassificationEvaluator)
-for binary data, or a [`MultiClassClassificationEvaluator`](api/scala/index.html#org.apache.spark.ml.MultiClassClassificationEvaluator)
+The `Evaluator` can be a [`RegressionEvaluator`](api/scala/index.html#org.apache.spark.ml.evaluation.RegressionEvaluator)
+for regression problems, a [`BinaryClassificationEvaluator`](api/scala/index.html#org.apache.spark.ml.evaluation.BinaryClassificationEvaluator)
+for binary data, or a [`MultiClassClassificationEvaluator`](api/scala/index.html#org.apache.spark.ml.evaluation.MultiClassClassificationEvaluator)
for multiclass problems. The default metric used to choose the best `ParamMap` can be overriden by the `setMetric`
method in each of these evaluators.