[SPARK-7780][MLLIB] intercept in logisticregressionwith lbfgs should not be regularized - spark

diff options

author	Holden Karau <holden@us.ibm.com>	2016-01-26 17:59:05 -0800
committer	DB Tsai <dbt@netflix.com>	2016-01-26 17:59:05 -0800
commit	b72611f20a03c790b6fd341b6ffdb3b5437609ee (patch)
tree	89275beeab22511f74526b54f6c02022d429f5fe /R
parent	555127387accdd7c1cf236912941822ba8af0a52 (diff)
download	spark-b72611f20a03c790b6fd341b6ffdb3b5437609ee.tar.gz spark-b72611f20a03c790b6fd341b6ffdb3b5437609ee.tar.bz2 spark-b72611f20a03c790b6fd341b6ffdb3b5437609ee.zip

[SPARK-7780][MLLIB] intercept in logisticregressionwith lbfgs should not be regularized

The intercept in Logistic Regression represents a prior on categories which should not be regularized. In MLlib, the regularization is handled through Updater, and the Updater penalizes all the components without excluding the intercept which resulting poor training accuracy with regularization. The new implementation in ML framework handles this properly, and we should call the implementation in ML from MLlib since majority of users are still using MLlib api. Note that both of them are doing feature scalings to improve the convergence, and the only difference is ML version doesn't regularize the intercept. As a result, when lambda is zero, they will converge to the same solution. Previously partially reviewed at https://github.com/apache/spark/pull/6386#issuecomment-168781424 re-opening for dbtsai to review. Author: Holden Karau <holden@us.ibm.com> Author: Holden Karau <holden@pigscanfly.ca> Closes #10788 from holdenk/SPARK-7780-intercept-in-logisticregressionwithLBFGS-should-not-be-regularized.

Diffstat (limited to 'R')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: