diff options
author | DB Tsai <dbtsai@alpinenow.com> | 2014-08-03 21:39:21 -0700 |
---|---|---|
committer | Xiangrui Meng <meng@databricks.com> | 2014-08-03 21:39:21 -0700 |
commit | ae58aea2d1435b5bb011e68127e1bcddc2edf5b2 (patch) | |
tree | ca1a5c60fa45714f8429aed9f96f719c553e92bc /streaming | |
parent | 5507dd8e18fbb52d5e0c64a767103b2418cb09c6 (diff) | |
download | spark-ae58aea2d1435b5bb011e68127e1bcddc2edf5b2.tar.gz spark-ae58aea2d1435b5bb011e68127e1bcddc2edf5b2.tar.bz2 spark-ae58aea2d1435b5bb011e68127e1bcddc2edf5b2.zip |
SPARK-2272 [MLlib] Feature scaling which standardizes the range of independent variables or features of data
Feature scaling is a method used to standardize the range of independent variables or features of data. In data processing, it is generally performed during the data preprocessing step.
In this work, a trait called `VectorTransformer` is defined for generic transformation on a vector. It contains one method to be implemented, `transform` which applies transformation on a vector.
There are two implementations of `VectorTransformer` now, and they all can be easily extended with PMML transformation support.
1) `StandardScaler` - Standardizes features by removing the mean and scaling to unit variance using column summary statistics on the samples in the training set.
2) `Normalizer` - Normalizes samples individually to unit L^n norm
Author: DB Tsai <dbtsai@alpinenow.com>
Closes #1207 from dbtsai/dbtsai-feature-scaling and squashes the following commits:
78c15d3 [DB Tsai] Alpine Data Labs
Diffstat (limited to 'streaming')
0 files changed, 0 insertions, 0 deletions