[SPARK-8700][ML] Disable feature scaling in Logistic Regression - spark

diff options

author	DB Tsai <dbt@netflix.com>	2015-07-08 15:21:58 -0700
committer	DB Tsai <dbt@netflix.com>	2015-07-08 15:21:58 -0700
commit	57221934e0376e5bb8421dc35d4bf91db4deeca1 (patch)
tree	d7736dda417fa4dae7b61c1bfa63da62413cb030 /ec2
parent	00b265f12c0f0271b7036f831fee09b694908b29 (diff)
download	spark-57221934e0376e5bb8421dc35d4bf91db4deeca1.tar.gz spark-57221934e0376e5bb8421dc35d4bf91db4deeca1.tar.bz2 spark-57221934e0376e5bb8421dc35d4bf91db4deeca1.zip

[SPARK-8700][ML] Disable feature scaling in Logistic Regression

All compressed sensing applications, and some of the regression use-cases will have better result by turning the feature scaling off. However, if we implement this naively by training the dataset without doing any standardization, the rate of convergency will not be good. This can be implemented by still standardizing the training dataset but we penalize each component differently to get effectively the same objective function but a better numerical problem. As a result, for those columns with high variances, they will be penalized less, and vice versa. Without this, since all the features are standardized, so they will be penalized the same. In R, there is an option for this. `standardize` Logical flag for x variable standardization, prior to fitting the model sequence. The coefficients are always returned on the original scale. Default is standardize=TRUE. If variables are in the same units already, you might not wish to standardize. See details below for y standardization with family="gaussian". +cc holdenk mengxr jkbradley Author: DB Tsai <dbt@netflix.com> Closes #7080 from dbtsai/lors and squashes the following commits: 877e6c7 [DB Tsai] repahse the doc 7cf45f2 [DB Tsai] address feedback 78d75c9 [DB Tsai] small change c2c9e60 [DB Tsai] style 6e1a8e0 [DB Tsai] first commit

Diffstat (limited to 'ec2')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: