From e07baf14120bc94b783649dabf5fffea58bff0de Mon Sep 17 00:00:00 2001
From: Sean Owen <sowen@cloudera.com>
Date: Sat, 27 Aug 2016 08:48:56 +0100
Subject: [SPARK-17001][ML] Enable standardScaler to standardize sparse vectors
 when withMean=True

## What changes were proposed in this pull request?

Allow centering / mean scaling of sparse vectors in StandardScaler, if requested. This is for compatibility with `VectorAssembler` in common usages.

## How was this patch tested?

Jenkins tests, including new caes to reflect the new behavior.

Author: Sean Owen <sowen@cloudera.com>

Closes #14663 from srowen/SPARK-17001.
---
 python/pyspark/mllib/feature.py | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

(limited to 'python')

diff --git a/python/pyspark/mllib/feature.py b/python/pyspark/mllib/feature.py
index c8a6e33f4d..324ba9758e 100644
--- a/python/pyspark/mllib/feature.py
+++ b/python/pyspark/mllib/feature.py
@@ -208,9 +208,8 @@ class StandardScaler(object):
     training set.
 
     :param withMean: False by default. Centers the data with mean
-                     before scaling. It will build a dense output, so this
-                     does not work on sparse input and will raise an
-                     exception.
+                     before scaling. It will build a dense output, so take
+                     care when applying to sparse input.
     :param withStd: True by default. Scales the data to unit
                     standard deviation.
 
-- 
cgit v1.2.3