use specialized axpy in RowMatrix for SVD

After running some more tests on large matrix, found that the BV axpy (breeze/linalg/Vector.scala, axpy) is slower than the BSV axpy (breeze/linalg/operators/SparseVectorOps.scala, sv_dv_axpy), 8s v.s. 2s for each multiplication. The BV axpy operates on an iterator while BSV axpy directly operates on the underlying array. I think the overhead comes from creating the iterator (with a zip) and advancing the pointers. Author: Li Pu <lpu@twitter.com> Author: Xiangrui Meng <meng@databricks.com> Author: Li Pu <li.pu@outlook.com> Closes #1378 from vrilleup/master and squashes the following commits: 6fb01a3 [Li Pu] use specialized axpy in RowMatrix 5255f2a [Li Pu] Merge remote-tracking branch 'upstream/master' 7312ec1 [Li Pu] very minor comment fix 4c618e9 [Li Pu] Merge pull request #1 from mengxr/vrilleup-master a461082 [Xiangrui Meng] make superscript show up correctly in doc 861ec48 [Xiangrui Meng] simplify axpy 62969fa [Xiangrui Meng] use BDV directly in symmetricEigs change the computation mode to local-svd, local-eigs, and dist-eigs update tests and docs c273771 [Li Pu] automatically determine SVD compute mode and parameters 7148426 [Li Pu] improve RowMatrix multiply 5543cce [Li Pu] improve svd api 819824b [Li Pu] add flag for dense svd or sparse svd eb15100 [Li Pu] fix binary compatibility 4c7aec3 [Li Pu] improve comments e7850ed [Li Pu] use aggregate and axpy 827411b [Li Pu] fix EOF new line 9c80515 [Li Pu] use non-sparse implementation when k = n fe983b0 [Li Pu] improve scala style 96d2ecb [Li Pu] improve eigenvalue sorting e1db950 [Li Pu] SPARK-1782: svd for sparse matrix using ARPACK
author: Li Pu <lpu@twitter.com> 2014-07-11 23:26:47 -0700
committer: Xiangrui Meng <meng@databricks.com> 2014-07-11 23:26:47 -0700
commit: d38887b8a0d00a11d7cf9393e7cb0918c3ec7a22 (patch)
tree: d2adf7acf042f86b19a28105b23265e28c4dd452 /mllib
parent: 55960869358d4f8aa5b2e3b17d87b0b02ba9acdd (diff)
download: spark-d38887b8a0d00a11d7cf9393e7cb0918c3ec7a22.tar.gz
spark-d38887b8a0d00a11d7cf9393e7cb0918c3ec7a22.tar.bz2
spark-d38887b8a0d00a11d7cf9393e7cb0918c3ec7a22.zip
1 files changed, 7 insertions, 1 deletions
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala b/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala
index 711e32a330..f4c403bc78 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala
@@ -83,7 +83,13 @@ class RowMatrix(
       seqOp = (U, r) => {
         val rBrz = r.toBreeze
         val a = rBrz.dot(vbr.value)
-        brzAxpy(a, rBrz, U.asInstanceOf[BV[Double]])
+        rBrz match {
+          // use specialized axpy for better performance
+          case _: BDV[_] => brzAxpy(a, rBrz.asInstanceOf[BDV[Double]], U)
+          case _: BSV[_] => brzAxpy(a, rBrz.asInstanceOf[BSV[Double]], U)
+          case _ => throw new UnsupportedOperationException(
+            s"Do not support vector operation from type ${rBrz.getClass.getName}.")
+        }
         U
       },
       combOp = (U1, U2) => U1 += U2
author	Li Pu <lpu@twitter.com>	2014-07-11 23:26:47 -0700
committer	Xiangrui Meng <meng@databricks.com>	2014-07-11 23:26:47 -0700
commit	d38887b8a0d00a11d7cf9393e7cb0918c3ec7a22 (patch)
tree	d2adf7acf042f86b19a28105b23265e28c4dd452 /mllib
parent	55960869358d4f8aa5b2e3b17d87b0b02ba9acdd (diff)
download	spark-d38887b8a0d00a11d7cf9393e7cb0918c3ec7a22.tar.gz spark-d38887b8a0d00a11d7cf9393e7cb0918c3ec7a22.tar.bz2 spark-d38887b8a0d00a11d7cf9393e7cb0918c3ec7a22.zip