aboutsummaryrefslogtreecommitdiff
path: root/streaming
diff options
context:
space:
mode:
authorDB Tsai <dbtsai@alpinenow.com>2014-08-03 21:39:21 -0700
committerXiangrui Meng <meng@databricks.com>2014-08-03 21:39:21 -0700
commitae58aea2d1435b5bb011e68127e1bcddc2edf5b2 (patch)
treeca1a5c60fa45714f8429aed9f96f719c553e92bc /streaming
parent5507dd8e18fbb52d5e0c64a767103b2418cb09c6 (diff)
downloadspark-ae58aea2d1435b5bb011e68127e1bcddc2edf5b2.tar.gz
spark-ae58aea2d1435b5bb011e68127e1bcddc2edf5b2.tar.bz2
spark-ae58aea2d1435b5bb011e68127e1bcddc2edf5b2.zip
SPARK-2272 [MLlib] Feature scaling which standardizes the range of independent variables or features of data
Feature scaling is a method used to standardize the range of independent variables or features of data. In data processing, it is generally performed during the data preprocessing step. In this work, a trait called `VectorTransformer` is defined for generic transformation on a vector. It contains one method to be implemented, `transform` which applies transformation on a vector. There are two implementations of `VectorTransformer` now, and they all can be easily extended with PMML transformation support. 1) `StandardScaler` - Standardizes features by removing the mean and scaling to unit variance using column summary statistics on the samples in the training set. 2) `Normalizer` - Normalizes samples individually to unit L^n norm Author: DB Tsai <dbtsai@alpinenow.com> Closes #1207 from dbtsai/dbtsai-feature-scaling and squashes the following commits: 78c15d3 [DB Tsai] Alpine Data Labs
Diffstat (limited to 'streaming')
0 files changed, 0 insertions, 0 deletions