aboutsummaryrefslogtreecommitdiff
path: root/mllib
diff options
context:
space:
mode:
authorSean Owen <srowen@gmail.com>2014-07-30 08:55:15 -0700
committerXiangrui Meng <meng@databricks.com>2014-07-30 08:55:15 -0700
commitee07541e99f0d262bf662b669b6542cf302ff39c (patch)
tree005f8a40502e5868cdcfcd4b9afc868f7951700c /mllib
parent7c5fc28af42daaa6725af083d78c2372f3d0a338 (diff)
downloadspark-ee07541e99f0d262bf662b669b6542cf302ff39c.tar.gz
spark-ee07541e99f0d262bf662b669b6542cf302ff39c.tar.bz2
spark-ee07541e99f0d262bf662b669b6542cf302ff39c.zip
SPARK-2748 [MLLIB] [GRAPHX] Loss of precision for small arguments to Math.exp, Math.log
In a few places in MLlib, an expression of the form `log(1.0 + p)` is evaluated. When p is so small that `1.0 + p == 1.0`, the result is 0.0. However the correct answer is very near `p`. This is why `Math.log1p` exists. Similarly for one instance of `exp(m) - 1` in GraphX; there's a special `Math.expm1` method. While the errors occur only for very small arguments, given their use in machine learning algorithms, this is entirely possible. Also note the related PR for Python: https://github.com/apache/spark/pull/1652 Author: Sean Owen <srowen@gmail.com> Closes #1659 from srowen/SPARK-2748 and squashes the following commits: c5926d4 [Sean Owen] Use log1p, expm1 for better precision for tiny arguments
Diffstat (limited to 'mllib')
-rw-r--r--mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala8
1 files changed, 4 insertions, 4 deletions
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala b/mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala
index 679842f831..9d82f011e6 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala
@@ -68,9 +68,9 @@ class LogisticGradient extends Gradient {
val gradient = brzData * gradientMultiplier
val loss =
if (label > 0) {
- math.log(1 + math.exp(margin))
+ math.log1p(math.exp(margin)) // log1p is log(1+p) but more accurate for small p
} else {
- math.log(1 + math.exp(margin)) - margin
+ math.log1p(math.exp(margin)) - margin
}
(Vectors.fromBreeze(gradient), loss)
@@ -89,9 +89,9 @@ class LogisticGradient extends Gradient {
brzAxpy(gradientMultiplier, brzData, cumGradient.toBreeze)
if (label > 0) {
- math.log(1 + math.exp(margin))
+ math.log1p(math.exp(margin))
} else {
- math.log(1 + math.exp(margin)) - margin
+ math.log1p(math.exp(margin)) - margin
}
}
}