diff options
author | Yuhao Yang <hhbyyh@gmail.com> | 2015-02-01 19:40:26 -0800 |
---|---|---|
committer | Xiangrui Meng <meng@databricks.com> | 2015-02-01 19:40:26 -0800 |
commit | d85cd4eb1479f8d37dab360530dc2c71216b4a8d (patch) | |
tree | 45f5c8e298701bb989a9cd5fdb74d9a763293f39 /bin/spark-submit.cmd | |
parent | ec1003219b8978291abca2fc409ee61b1bb40a38 (diff) | |
download | spark-d85cd4eb1479f8d37dab360530dc2c71216b4a8d.tar.gz spark-d85cd4eb1479f8d37dab360530dc2c71216b4a8d.tar.bz2 spark-d85cd4eb1479f8d37dab360530dc2c71216b4a8d.zip |
[Spark-5406][MLlib] LocalLAPACK mode in RowMatrix.computeSVD should have much smaller upper bound
JIRA link: https://issues.apache.org/jira/browse/SPARK-5406
The code in breeze svd imposes the upper bound for LocalLAPACK in RowMatrix.computeSVD
code from breeze svd (https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/linalg/functions/svd.scala)
val workSize = ( 3
* scala.math.min(m, n)
* scala.math.min(m, n)
+ scala.math.max(scala.math.max(m, n), 4 * scala.math.min(m, n)
* scala.math.min(m, n) + 4 * scala.math.min(m, n))
)
val work = new Array[Double](workSize)
As a result, 7 * n * n + 4 * n < Int.MaxValue at least (depends on JVM)
In some worse cases, like n = 25000, work size will become positive again (80032704) and bring wired behavior.
The PR is only the beginning, to support Genbase ( an important biological benchmark that would help promote Spark to genetic applications, http://www.paradigm4.com/wp-content/uploads/2014/06/Genomics-Benchmark-Technical-Report.pdf),
which needs to compute svd for matrix up to 60K * 70K. I found many potential issues and would like to know if there's any plan undergoing that would expand the range of matrix computation based on Spark.
Thanks.
Author: Yuhao Yang <hhbyyh@gmail.com>
Closes #4200 from hhbyyh/rowMatrix and squashes the following commits:
f7864d0 [Yuhao Yang] update auto logic for rowMatrix svd
23860e4 [Yuhao Yang] fix comment style
e48a6e4 [Yuhao Yang] make latent svd computation constraint clear
Diffstat (limited to 'bin/spark-submit.cmd')
0 files changed, 0 insertions, 0 deletions