From e07baf14120bc94b783649dabf5fffea58bff0de Mon Sep 17 00:00:00 2001 From: Sean Owen Date: Sat, 27 Aug 2016 08:48:56 +0100 Subject: [SPARK-17001][ML] Enable standardScaler to standardize sparse vectors when withMean=True ## What changes were proposed in this pull request? Allow centering / mean scaling of sparse vectors in StandardScaler, if requested. This is for compatibility with `VectorAssembler` in common usages. ## How was this patch tested? Jenkins tests, including new caes to reflect the new behavior. Author: Sean Owen Closes #14663 from srowen/SPARK-17001. --- python/pyspark/mllib/feature.py | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'python') diff --git a/python/pyspark/mllib/feature.py b/python/pyspark/mllib/feature.py index c8a6e33f4d..324ba9758e 100644 --- a/python/pyspark/mllib/feature.py +++ b/python/pyspark/mllib/feature.py @@ -208,9 +208,8 @@ class StandardScaler(object): training set. :param withMean: False by default. Centers the data with mean - before scaling. It will build a dense output, so this - does not work on sparse input and will raise an - exception. + before scaling. It will build a dense output, so take + care when applying to sparse input. :param withStd: True by default. Scales the data to unit standard deviation. -- cgit v1.2.3