aboutsummaryrefslogtreecommitdiff
path: root/mllib/src
diff options
context:
space:
mode:
authorSean Owen <sowen@cloudera.com>2014-07-03 11:54:51 -0700
committerXiangrui Meng <meng@databricks.com>2014-07-03 11:54:51 -0700
commit2b36344f588d4e7357ce9921dc656e2389ba1dea (patch)
tree6bf34b3c688579cf314df91932ef31745f39cf9b /mllib/src
parentc480537739f9329ebfd580f09c69778e6c976366 (diff)
downloadspark-2b36344f588d4e7357ce9921dc656e2389ba1dea.tar.gz
spark-2b36344f588d4e7357ce9921dc656e2389ba1dea.tar.bz2
spark-2b36344f588d4e7357ce9921dc656e2389ba1dea.zip
SPARK-1675. Make clear whether computePrincipalComponents requires centered data
Just closing out this small JIRA, resolving with a comment change. Author: Sean Owen <sowen@cloudera.com> Closes #1171 from srowen/SPARK-1675 and squashes the following commits: 45ee9b7 [Sean Owen] Add simple note that data need not be centered for computePrincipalComponents
Diffstat (limited to 'mllib/src')
-rw-r--r--mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala2
1 files changed, 2 insertions, 0 deletions
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala b/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala
index 1a0073c9d4..695e03b736 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala
@@ -347,6 +347,8 @@ class RowMatrix(
* The principal components are stored a local matrix of size n-by-k.
* Each column corresponds for one principal component,
* and the columns are in descending order of component variance.
+ * The row data do not need to be "centered" first; it is not necessary for
+ * the mean of each column to be 0.
*
* @param k number of top principal components.
* @return a matrix of size n-by-k, whose columns are principal components