[SPARK-8468] [ML] Take the negative of some metrics in RegressionEvaluator to get correct cross validation

JIRA: https://issues.apache.org/jira/browse/SPARK-8468 Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #6905 from viirya/cv_min and squashes the following commits: 930d3db [Liang-Chi Hsieh] Fix python unit test and add document. d632135 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into cv_min 16e3b2c [Liang-Chi Hsieh] Take the negative instead of reciprocal. c3dd8d9 [Liang-Chi Hsieh] For comments. b5f52c1 [Liang-Chi Hsieh] Add param to CrossValidator for choosing whether to maximize evaulation value.
author: Liang-Chi Hsieh <viirya@gmail.com> 2015-06-20 13:01:59 -0700
committer: Joseph K. Bradley <joseph@databricks.com> 2015-06-20 13:01:59 -0700
commit: 0b8995168f02bb55afb0a5b7dbdb941c3c89cb4c (patch)
tree: 64a27502be793519bed306017f558f1a3fb15044 /python/pyspark
parent: 1b6fe9b1a70aa3f81448c2705ea3a4b501cbda9d (diff)
download: spark-0b8995168f02bb55afb0a5b7dbdb941c3c89cb4c.tar.gz
spark-0b8995168f02bb55afb0a5b7dbdb941c3c89cb4c.tar.bz2
spark-0b8995168f02bb55afb0a5b7dbdb941c3c89cb4c.zip
1 files changed, 5 insertions, 3 deletions
diff --git a/python/pyspark/ml/evaluation.py b/python/pyspark/ml/evaluation.py
index d8ddb78c6d..595593a7f2 100644
--- a/python/pyspark/ml/evaluation.py
+++ b/python/pyspark/ml/evaluation.py
@@ -160,13 +160,15 @@ class RegressionEvaluator(JavaEvaluator, HasLabelCol, HasPredictionCol):
     ...
     >>> evaluator = RegressionEvaluator(predictionCol="raw")
     >>> evaluator.evaluate(dataset)
-    2.842...
+    -2.842...
     >>> evaluator.evaluate(dataset, {evaluator.metricName: "r2"})
     0.993...
     >>> evaluator.evaluate(dataset, {evaluator.metricName: "mae"})
-    2.649...
+    -2.649...
     """
-    # a placeholder to make it appear in the generated doc
+    # Because we will maximize evaluation value (ref: `CrossValidator`),
+    # when we evaluate a metric that is needed to minimize (e.g., `"rmse"`, `"mse"`, `"mae"`),
+    # we take and output the negative of this metric.
     metricName = Param(Params._dummy(), "metricName",
                        "metric name in evaluation (mse|rmse|r2|mae)")
author	Liang-Chi Hsieh <viirya@gmail.com>	2015-06-20 13:01:59 -0700
committer	Joseph K. Bradley <joseph@databricks.com>	2015-06-20 13:01:59 -0700
commit	0b8995168f02bb55afb0a5b7dbdb941c3c89cb4c (patch)
tree	64a27502be793519bed306017f558f1a3fb15044 /python/pyspark
parent	1b6fe9b1a70aa3f81448c2705ea3a4b501cbda9d (diff)
download	spark-0b8995168f02bb55afb0a5b7dbdb941c3c89cb4c.tar.gz spark-0b8995168f02bb55afb0a5b7dbdb941c3c89cb4c.tar.bz2 spark-0b8995168f02bb55afb0a5b7dbdb941c3c89cb4c.zip