[SPARK-9090] [ML] Fix definition of residual in LinearRegressionSummary, EnsembleTestHelper, and SquaredError

Make the definition of residuals in Spark consistent with literature. We have been using `prediction - label` for residuals, but literature usually defines `residual = label - prediction`. Author: Feynman Liang <fliang@databricks.com> Closes #7435 from feynmanliang/SPARK-9090-Fix-LinearRegressionSummary-Residuals and squashes the following commits: f4b39d8 [Feynman Liang] Fix doc bc12a92 [Feynman Liang] Tweak EnsembleTestHelper and SquaredError residuals 63f0d60 [Feynman Liang] Fix definition of residual
author: Feynman Liang <fliang@databricks.com> 2015-07-17 14:00:53 -0700
committer: Joseph K. Bradley <joseph@databricks.com> 2015-07-17 14:00:53 -0700
commit: 6da1069696186572c66cbd83947c1a1dbd2bc827 (patch)
tree: 5bb7ed475b06d0025f3ae377f9b7fade5017843f /mllib/src/main
parent: ad0954f6de29761e0e7e543212c5bfe1fdcbed9f (diff)
download: spark-6da1069696186572c66cbd83947c1a1dbd2bc827.tar.gz
spark-6da1069696186572c66cbd83947c1a1dbd2bc827.tar.bz2
spark-6da1069696186572c66cbd83947c1a1dbd2bc827.zip
2 files changed, 4 insertions, 4 deletions
diff --git a/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala b/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala
index 8fc9860566..89718e0f3e 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala
@@ -355,9 +355,9 @@ class LinearRegressionSummary private[regression] (
    */
   val r2: Double = metrics.r2
 
-  /** Residuals (predicted value - label value) */
+  /** Residuals (label - predicted value) */
   @transient lazy val residuals: DataFrame = {
-    val t = udf { (pred: Double, label: Double) => pred - label}
+    val t = udf { (pred: Double, label: Double) => label - pred }
     predictions.select(t(col(predictionCol), col(labelCol)).as("residuals"))
   }
 
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/tree/loss/SquaredError.scala b/mllib/src/main/scala/org/apache/spark/mllib/tree/loss/SquaredError.scala
index a5582d3ef3..011a5d5742 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/tree/loss/SquaredError.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/tree/loss/SquaredError.scala
@@ -42,11 +42,11 @@ object SquaredError extends Loss {
    * @return Loss gradient
    */
   override def gradient(prediction: Double, label: Double): Double = {
-    2.0 * (prediction - label)
+    - 2.0 * (label - prediction)
   }
 
   override private[mllib] def computeError(prediction: Double, label: Double): Double = {
-    val err = prediction - label
+    val err = label - prediction
     err * err
   }
 }
author	Feynman Liang <fliang@databricks.com>	2015-07-17 14:00:53 -0700
committer	Joseph K. Bradley <joseph@databricks.com>	2015-07-17 14:00:53 -0700
commit	6da1069696186572c66cbd83947c1a1dbd2bc827 (patch)
tree	5bb7ed475b06d0025f3ae377f9b7fade5017843f /mllib/src/main
parent	ad0954f6de29761e0e7e543212c5bfe1fdcbed9f (diff)
download	spark-6da1069696186572c66cbd83947c1a1dbd2bc827.tar.gz spark-6da1069696186572c66cbd83947c1a1dbd2bc827.tar.bz2 spark-6da1069696186572c66cbd83947c1a1dbd2bc827.zip