[SPARK-17001][ML] Enable standardScaler to standardize sparse vectors when withMean=True

## What changes were proposed in this pull request? Allow centering / mean scaling of sparse vectors in StandardScaler, if requested. This is for compatibility with `VectorAssembler` in common usages. ## How was this patch tested? Jenkins tests, including new caes to reflect the new behavior. Author: Sean Owen <sowen@cloudera.com> Closes #14663 from srowen/SPARK-17001.
author: Sean Owen <sowen@cloudera.com> 2016-08-27 08:48:56 +0100
committer: Sean Owen <sowen@cloudera.com> 2016-08-27 08:48:56 +0100
commit: e07baf14120bc94b783649dabf5fffea58bff0de (patch)
tree: 557979925874c18034e793057a9706c3ee6924fa /python/pyspark/mllib
parent: 9fbced5b25c2f24d50c50516b4b7737f7e3eaf86 (diff)
download: spark-e07baf14120bc94b783649dabf5fffea58bff0de.tar.gz
spark-e07baf14120bc94b783649dabf5fffea58bff0de.tar.bz2
spark-e07baf14120bc94b783649dabf5fffea58bff0de.zip
1 files changed, 2 insertions, 3 deletions
diff --git a/python/pyspark/mllib/feature.py b/python/pyspark/mllib/feature.py
index c8a6e33f4d..324ba9758e 100644
--- a/python/pyspark/mllib/feature.py
+++ b/python/pyspark/mllib/feature.py
@@ -208,9 +208,8 @@ class StandardScaler(object):
     training set.
 
     :param withMean: False by default. Centers the data with mean
-                     before scaling. It will build a dense output, so this
-                     does not work on sparse input and will raise an
-                     exception.
+                     before scaling. It will build a dense output, so take
+                     care when applying to sparse input.
     :param withStd: True by default. Scales the data to unit
                     standard deviation.
author	Sean Owen <sowen@cloudera.com>	2016-08-27 08:48:56 +0100
committer	Sean Owen <sowen@cloudera.com>	2016-08-27 08:48:56 +0100
commit	e07baf14120bc94b783649dabf5fffea58bff0de (patch)
tree	557979925874c18034e793057a9706c3ee6924fa /python/pyspark/mllib
parent	9fbced5b25c2f24d50c50516b4b7737f7e3eaf86 (diff)
download	spark-e07baf14120bc94b783649dabf5fffea58bff0de.tar.gz spark-e07baf14120bc94b783649dabf5fffea58bff0de.tar.bz2 spark-e07baf14120bc94b783649dabf5fffea58bff0de.zip