diff options
author | Sean Owen <srowen@gmail.com> | 2014-07-30 08:55:15 -0700 |
---|---|---|
committer | Xiangrui Meng <meng@databricks.com> | 2014-07-30 08:55:15 -0700 |
commit | ee07541e99f0d262bf662b669b6542cf302ff39c (patch) | |
tree | 005f8a40502e5868cdcfcd4b9afc868f7951700c /graphx/src | |
parent | 7c5fc28af42daaa6725af083d78c2372f3d0a338 (diff) | |
download | spark-ee07541e99f0d262bf662b669b6542cf302ff39c.tar.gz spark-ee07541e99f0d262bf662b669b6542cf302ff39c.tar.bz2 spark-ee07541e99f0d262bf662b669b6542cf302ff39c.zip |
SPARK-2748 [MLLIB] [GRAPHX] Loss of precision for small arguments to Math.exp, Math.log
In a few places in MLlib, an expression of the form `log(1.0 + p)` is evaluated. When p is so small that `1.0 + p == 1.0`, the result is 0.0. However the correct answer is very near `p`. This is why `Math.log1p` exists.
Similarly for one instance of `exp(m) - 1` in GraphX; there's a special `Math.expm1` method.
While the errors occur only for very small arguments, given their use in machine learning algorithms, this is entirely possible.
Also note the related PR for Python: https://github.com/apache/spark/pull/1652
Author: Sean Owen <srowen@gmail.com>
Closes #1659 from srowen/SPARK-2748 and squashes the following commits:
c5926d4 [Sean Owen] Use log1p, expm1 for better precision for tiny arguments
Diffstat (limited to 'graphx/src')
-rw-r--r-- | graphx/src/main/scala/org/apache/spark/graphx/util/GraphGenerators.scala | 6 |
1 files changed, 4 insertions, 2 deletions
diff --git a/graphx/src/main/scala/org/apache/spark/graphx/util/GraphGenerators.scala b/graphx/src/main/scala/org/apache/spark/graphx/util/GraphGenerators.scala index 635514f09e..60149548ab 100644 --- a/graphx/src/main/scala/org/apache/spark/graphx/util/GraphGenerators.scala +++ b/graphx/src/main/scala/org/apache/spark/graphx/util/GraphGenerators.scala @@ -100,8 +100,10 @@ object GraphGenerators { */ private def sampleLogNormal(mu: Double, sigma: Double, maxVal: Int): Int = { val rand = new Random() - val m = math.exp(mu + (sigma * sigma) / 2.0) - val s = math.sqrt((math.exp(sigma*sigma) - 1) * math.exp(2*mu + sigma*sigma)) + val sigmaSq = sigma * sigma + val m = math.exp(mu + sigmaSq / 2.0) + // expm1 is exp(m)-1 with better accuracy for tiny m + val s = math.sqrt(math.expm1(sigmaSq) * math.exp(2*mu + sigmaSq)) // Z ~ N(0, 1) var X: Double = maxVal |