aboutsummaryrefslogtreecommitdiff
path: root/python/pyspark/mllib/classification.py
diff options
context:
space:
mode:
authorNaftali Harris <naftaliharris@gmail.com>2014-07-30 09:56:59 -0700
committerXiangrui Meng <meng@databricks.com>2014-07-30 09:56:59 -0700
commite3d85b7e40073b05e2588583e9d8db11366c2f7b (patch)
tree8691dfd4ee050bbc60ffa3489c9b1b188bb1807a /python/pyspark/mllib/classification.py
parent3bc3f1801e3347e02cbecdd8e941003430155da2 (diff)
downloadspark-e3d85b7e40073b05e2588583e9d8db11366c2f7b.tar.gz
spark-e3d85b7e40073b05e2588583e9d8db11366c2f7b.tar.bz2
spark-e3d85b7e40073b05e2588583e9d8db11366c2f7b.zip
Avoid numerical instability
This avoids basically doing 1 - 1, for example: ```python >>> from math import exp >>> margin = -40 >>> 1 - 1 / (1 + exp(margin)) 0.0 >>> exp(margin) / (1 + exp(margin)) 4.248354255291589e-18 >>> ``` Author: Naftali Harris <naftaliharris@gmail.com> Closes #1652 from naftaliharris/patch-2 and squashes the following commits: 0d55a9f [Naftali Harris] Avoid numerical instability
Diffstat (limited to 'python/pyspark/mllib/classification.py')
-rw-r--r--python/pyspark/mllib/classification.py3
1 files changed, 2 insertions, 1 deletions
diff --git a/python/pyspark/mllib/classification.py b/python/pyspark/mllib/classification.py
index 9e28dfbb91..2bbb9c3fca 100644
--- a/python/pyspark/mllib/classification.py
+++ b/python/pyspark/mllib/classification.py
@@ -66,7 +66,8 @@ class LogisticRegressionModel(LinearModel):
if margin > 0:
prob = 1 / (1 + exp(-margin))
else:
- prob = 1 - 1 / (1 + exp(margin))
+ exp_margin = exp(margin)
+ prob = exp_margin / (1 + exp_margin)
return 1 if prob > 0.5 else 0