[SPARK-5099][Mllib] Simplify logistic loss function

This is a minor pr where I think that we can simply take minus of `margin`, instead of subtracting `margin`. Mathematically, they are equal. But the modified equation is the common form of logistic loss function and so more readable. It also computes more accurate value as some quick tests show. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #3899 from viirya/logit_func and squashes the following commits: 91a3860 [Liang-Chi Hsieh] Modified for comment. 0aa51e4 [Liang-Chi Hsieh] Further simplified. 72a295e [Liang-Chi Hsieh] Revert LogLoss back and add more considerations in Logistic Loss. a3f83ca [Liang-Chi Hsieh] Fix a bug. 2bc5712 [Liang-Chi Hsieh] Simplify loss function.
author: Liang-Chi Hsieh <viirya@gmail.com> 2015-01-06 21:23:31 -0800
committer: Xiangrui Meng <meng@databricks.com> 2015-01-06 21:23:31 -0800
commit: e21acc1978a6f4a57ef2e08490692b0ffe05fa9e (patch)
tree: 33f0337d96694b28c737a30d7f2eb57173e22267
parent: bb38ebb1abd26b57525d7d29703fd449e40cd6de (diff)
download: spark-e21acc1978a6f4a57ef2e08490692b0ffe05fa9e.tar.gz
spark-e21acc1978a6f4a57ef2e08490692b0ffe05fa9e.tar.bz2
spark-e21acc1978a6f4a57ef2e08490692b0ffe05fa9e.zip
1 files changed, 9 insertions, 3 deletions
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala b/mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala
index 5a419d1640..aaacf3a8a2 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala
@@ -64,11 +64,17 @@ class LogisticGradient extends Gradient {
     val gradientMultiplier = (1.0 / (1.0 + math.exp(margin))) - label
     val gradient = data.copy
     scal(gradientMultiplier, gradient)
+    val minusYP = if (label > 0) margin else -margin
+
+    // log1p is log(1+p) but more accurate for small p
+    // Following two equations are the same analytically but not numerically, e.g.,
+    // math.log1p(math.exp(1000)) == Infinity
+    // 1000 + math.log1p(math.exp(-1000)) == 1000.0
     val loss =
-      if (label > 0) {
-        math.log1p(math.exp(margin)) // log1p is log(1+p) but more accurate for small p
+      if (minusYP < 0) {
+        math.log1p(math.exp(minusYP))
       } else {
-        math.log1p(math.exp(margin)) - margin
+        math.log1p(math.exp(-minusYP)) + minusYP
       }
 
     (gradient, loss)
author	Liang-Chi Hsieh <viirya@gmail.com>	2015-01-06 21:23:31 -0800
committer	Xiangrui Meng <meng@databricks.com>	2015-01-06 21:23:31 -0800
commit	e21acc1978a6f4a57ef2e08490692b0ffe05fa9e (patch)
tree	33f0337d96694b28c737a30d7f2eb57173e22267
parent	bb38ebb1abd26b57525d7d29703fd449e40cd6de (diff)
download	spark-e21acc1978a6f4a57ef2e08490692b0ffe05fa9e.tar.gz spark-e21acc1978a6f4a57ef2e08490692b0ffe05fa9e.tar.bz2 spark-e21acc1978a6f4a57ef2e08490692b0ffe05fa9e.zip