diff options
author | Li Pu <lpu@twitter.com> | 2014-07-11 23:26:47 -0700 |
---|---|---|
committer | Xiangrui Meng <meng@databricks.com> | 2014-07-11 23:26:47 -0700 |
commit | d38887b8a0d00a11d7cf9393e7cb0918c3ec7a22 (patch) | |
tree | d2adf7acf042f86b19a28105b23265e28c4dd452 /mllib | |
parent | 55960869358d4f8aa5b2e3b17d87b0b02ba9acdd (diff) | |
download | spark-d38887b8a0d00a11d7cf9393e7cb0918c3ec7a22.tar.gz spark-d38887b8a0d00a11d7cf9393e7cb0918c3ec7a22.tar.bz2 spark-d38887b8a0d00a11d7cf9393e7cb0918c3ec7a22.zip |
use specialized axpy in RowMatrix for SVD
After running some more tests on large matrix, found that the BV axpy (breeze/linalg/Vector.scala, axpy) is slower than the BSV axpy (breeze/linalg/operators/SparseVectorOps.scala, sv_dv_axpy), 8s v.s. 2s for each multiplication. The BV axpy operates on an iterator while BSV axpy directly operates on the underlying array. I think the overhead comes from creating the iterator (with a zip) and advancing the pointers.
Author: Li Pu <lpu@twitter.com>
Author: Xiangrui Meng <meng@databricks.com>
Author: Li Pu <li.pu@outlook.com>
Closes #1378 from vrilleup/master and squashes the following commits:
6fb01a3 [Li Pu] use specialized axpy in RowMatrix
5255f2a [Li Pu] Merge remote-tracking branch 'upstream/master'
7312ec1 [Li Pu] very minor comment fix
4c618e9 [Li Pu] Merge pull request #1 from mengxr/vrilleup-master
a461082 [Xiangrui Meng] make superscript show up correctly in doc
861ec48 [Xiangrui Meng] simplify axpy
62969fa [Xiangrui Meng] use BDV directly in symmetricEigs change the computation mode to local-svd, local-eigs, and dist-eigs update tests and docs
c273771 [Li Pu] automatically determine SVD compute mode and parameters
7148426 [Li Pu] improve RowMatrix multiply
5543cce [Li Pu] improve svd api
819824b [Li Pu] add flag for dense svd or sparse svd
eb15100 [Li Pu] fix binary compatibility
4c7aec3 [Li Pu] improve comments
e7850ed [Li Pu] use aggregate and axpy
827411b [Li Pu] fix EOF new line
9c80515 [Li Pu] use non-sparse implementation when k = n
fe983b0 [Li Pu] improve scala style
96d2ecb [Li Pu] improve eigenvalue sorting
e1db950 [Li Pu] SPARK-1782: svd for sparse matrix using ARPACK
Diffstat (limited to 'mllib')
-rw-r--r-- | mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala | 8 |
1 files changed, 7 insertions, 1 deletions
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala b/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala index 711e32a330..f4c403bc78 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala @@ -83,7 +83,13 @@ class RowMatrix( seqOp = (U, r) => { val rBrz = r.toBreeze val a = rBrz.dot(vbr.value) - brzAxpy(a, rBrz, U.asInstanceOf[BV[Double]]) + rBrz match { + // use specialized axpy for better performance + case _: BDV[_] => brzAxpy(a, rBrz.asInstanceOf[BDV[Double]], U) + case _: BSV[_] => brzAxpy(a, rBrz.asInstanceOf[BSV[Double]], U) + case _ => throw new UnsupportedOperationException( + s"Do not support vector operation from type ${rBrz.getClass.getName}.") + } U }, combOp = (U1, U2) => U1 += U2 |