SPARK-2272 [MLlib] Feature scaling which standardizes the range of independent variables or features of data - spark

diff options

author	DB Tsai <dbtsai@alpinenow.com>	2014-08-03 21:39:21 -0700
committer	Xiangrui Meng <meng@databricks.com>	2014-08-03 21:39:21 -0700
commit	ae58aea2d1435b5bb011e68127e1bcddc2edf5b2 (patch)
tree	ca1a5c60fa45714f8429aed9f96f719c553e92bc /streaming
parent	5507dd8e18fbb52d5e0c64a767103b2418cb09c6 (diff)
download	spark-ae58aea2d1435b5bb011e68127e1bcddc2edf5b2.tar.gz spark-ae58aea2d1435b5bb011e68127e1bcddc2edf5b2.tar.bz2 spark-ae58aea2d1435b5bb011e68127e1bcddc2edf5b2.zip

SPARK-2272 [MLlib] Feature scaling which standardizes the range of independent variables or features of data

Feature scaling is a method used to standardize the range of independent variables or features of data. In data processing, it is generally performed during the data preprocessing step. In this work, a trait called `VectorTransformer` is defined for generic transformation on a vector. It contains one method to be implemented, `transform` which applies transformation on a vector. There are two implementations of `VectorTransformer` now, and they all can be easily extended with PMML transformation support. 1) `StandardScaler` - Standardizes features by removing the mean and scaling to unit variance using column summary statistics on the samples in the training set. 2) `Normalizer` - Normalizes samples individually to unit L^n norm Author: DB Tsai <dbtsai@alpinenow.com> Closes #1207 from dbtsai/dbtsai-feature-scaling and squashes the following commits: 78c15d3 [DB Tsai] Alpine Data Labs

Diffstat (limited to 'streaming')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: