[SPARK-17001][ML] Enable standardScaler to standardize sparse vectors when withMean=True

## What changes were proposed in this pull request? Allow centering / mean scaling of sparse vectors in StandardScaler, if requested. This is for compatibility with `VectorAssembler` in common usages. ## How was this patch tested? Jenkins tests, including new caes to reflect the new behavior. Author: Sean Owen <sowen@cloudera.com> Closes #14663 from srowen/SPARK-17001.
author: Sean Owen <sowen@cloudera.com> 2016-08-27 08:48:56 +0100
committer: Sean Owen <sowen@cloudera.com> 2016-08-27 08:48:56 +0100
commit: e07baf14120bc94b783649dabf5fffea58bff0de (patch)
tree: 557979925874c18034e793057a9706c3ee6924fa /docs
parent: 9fbced5b25c2f24d50c50516b4b7737f7e3eaf86 (diff)
download: spark-e07baf14120bc94b783649dabf5fffea58bff0de.tar.gz
spark-e07baf14120bc94b783649dabf5fffea58bff0de.tar.bz2
spark-e07baf14120bc94b783649dabf5fffea58bff0de.zip
2 files changed, 2 insertions, 2 deletions
diff --git a/docs/ml-features.md b/docs/ml-features.md
index e41bf78521..746593fb9e 100644
--- a/docs/ml-features.md
+++ b/docs/ml-features.md
@@ -768,7 +768,7 @@ for more details on the API.
 `StandardScaler` transforms a dataset of `Vector` rows, normalizing each feature to have unit standard deviation and/or zero mean.  It takes parameters:
 
 * `withStd`: True by default. Scales the data to unit standard deviation.
-* `withMean`: False by default. Centers the data with mean before scaling. It will build a dense output, so this does not work on sparse input and will raise an exception.
+* `withMean`: False by default. Centers the data with mean before scaling. It will build a dense output, so take care when applying to sparse input.
 
 `StandardScaler` is an `Estimator` which can be `fit` on a dataset to produce a `StandardScalerModel`; this amounts to computing summary statistics.  The model can then transform a `Vector` column in a dataset to have unit standard deviation and/or zero mean features.
 
diff --git a/docs/mllib-feature-extraction.md b/docs/mllib-feature-extraction.md
index 867be7f293..353d391249 100644
--- a/docs/mllib-feature-extraction.md
+++ b/docs/mllib-feature-extraction.md
@@ -148,7 +148,7 @@ against features with very large variances exerting an overly large influence du
 following parameters in the constructor:
 
 * `withMean` False by default. Centers the data with mean before scaling. It will build a dense
-output, so this does not work on sparse input and will raise an exception.
+output, so take care when applying to sparse input.
 * `withStd` True by default. Scales the data to unit standard deviation.
 
 We provide a [`fit`](api/scala/index.html#org.apache.spark.mllib.feature.StandardScaler) method in
author	Sean Owen <sowen@cloudera.com>	2016-08-27 08:48:56 +0100
committer	Sean Owen <sowen@cloudera.com>	2016-08-27 08:48:56 +0100
commit	e07baf14120bc94b783649dabf5fffea58bff0de (patch)
tree	557979925874c18034e793057a9706c3ee6924fa /docs
parent	9fbced5b25c2f24d50c50516b4b7737f7e3eaf86 (diff)
download	spark-e07baf14120bc94b783649dabf5fffea58bff0de.tar.gz spark-e07baf14120bc94b783649dabf5fffea58bff0de.tar.bz2 spark-e07baf14120bc94b783649dabf5fffea58bff0de.zip