[SPARK-6486] [MLLIB] [PYTHON] Add BlockMatrix to PySpark.

mengxr This adds the `BlockMatrix` to PySpark. I have the conversions to `IndexedRowMatrix` and `CoordinateMatrix` ready as well, so once PR #7554 is completed (which relies on PR #7746), this PR can be finished. Author: Mike Dusenberry <mwdusenb@us.ibm.com> Closes #7761 from dusenberrymw/SPARK-6486_Add_BlockMatrix_to_PySpark and squashes the following commits: 27195c2 [Mike Dusenberry] Adding one more check to _convert_to_matrix_block_tuple, and a few minor documentation changes. ae50883 [Mike Dusenberry] Minor update: BlockMatrix should inherit from DistributedMatrix. b8acc1c [Mike Dusenberry] Moving BlockMatrix to pyspark.mllib.linalg.distributed, updating the logic to match that of the other distributed matrices, adding conversions, and adding documentation. c014002 [Mike Dusenberry] Using properties for better documentation. 3bda6ab [Mike Dusenberry] Adding documentation. 8fb3095 [Mike Dusenberry] Small cleanup. e17af2e [Mike Dusenberry] Adding BlockMatrix to PySpark.
author: Mike Dusenberry <mwdusenb@us.ibm.com> 2015-08-05 07:40:50 -0700
committer: Xiangrui Meng <meng@databricks.com> 2015-08-05 07:40:50 -0700
commit: 34dcf10104460816382908b2b8eeb6c925e862bf (patch)
tree: a4767e939dd4a33e2d3eda23696ffcf03f55c4fa /docs/mllib-data-types.md
parent: 519cf6d3f764a977770266784d6902fe205a070f (diff)
download: spark-34dcf10104460816382908b2b8eeb6c925e862bf.tar.gz
spark-34dcf10104460816382908b2b8eeb6c925e862bf.tar.bz2
spark-34dcf10104460816382908b2b8eeb6c925e862bf.zip
1 files changed, 41 insertions, 0 deletions
diff --git a/docs/mllib-data-types.md b/docs/mllib-data-types.md
index 11033bf4f9..f0e8d54956 100644
--- a/docs/mllib-data-types.md
+++ b/docs/mllib-data-types.md
@@ -494,6 +494,9 @@ rowMat = mat.toRowMatrix()
 
 # Convert to a CoordinateMatrix.
 coordinateMat = mat.toCoordinateMatrix()
+
+# Convert to a BlockMatrix.
+blockMat = mat.toBlockMatrix()
 {% endhighlight %}
 </div>
 
@@ -594,6 +597,9 @@ rowMat = mat.toRowMatrix()
 
 # Convert to an IndexedRowMatrix.
 indexedRowMat = mat.toIndexedRowMatrix()
+
+# Convert to a BlockMatrix.
+blockMat = mat.toBlockMatrix()
 {% endhighlight %}
 </div>
 
@@ -661,4 +667,39 @@ matA.validate();
 BlockMatrix ata = matA.transpose().multiply(matA);
 {% endhighlight %}
 </div>
+
+<div data-lang="python" markdown="1">
+
+A [`BlockMatrix`](api/python/pyspark.mllib.html#pyspark.mllib.linalg.distributed.BlockMatrix) 
+can be created from an `RDD` of sub-matrix blocks, where a sub-matrix block is a 
+`((blockRowIndex, blockColIndex), sub-matrix)` tuple.
+
+{% highlight python %}
+from pyspark.mllib.linalg import Matrices
+from pyspark.mllib.linalg.distributed import BlockMatrix
+
+# Create an RDD of sub-matrix blocks.
+blocks = sc.parallelize([((0, 0), Matrices.dense(3, 2, [1, 2, 3, 4, 5, 6])), 
+                         ((1, 0), Matrices.dense(3, 2, [7, 8, 9, 10, 11, 12]))])
+
+# Create a BlockMatrix from an RDD of sub-matrix blocks.
+mat = BlockMatrix(blocks, 3, 2)
+
+# Get its size.
+m = mat.numRows() # 6
+n = mat.numCols() # 2
+
+# Get the blocks as an RDD of sub-matrix blocks.
+blocksRDD = mat.blocks
+
+# Convert to a LocalMatrix.
+localMat = mat.toLocalMatrix()
+
+# Convert to an IndexedRowMatrix.
+indexedRowMat = mat.toIndexedRowMatrix()
+
+# Convert to a CoordinateMatrix.
+coordinateMat = mat.toCoordinateMatrix()
+{% endhighlight %}
+</div>
 </div>
author	Mike Dusenberry <mwdusenb@us.ibm.com>	2015-08-05 07:40:50 -0700
committer	Xiangrui Meng <meng@databricks.com>	2015-08-05 07:40:50 -0700
commit	34dcf10104460816382908b2b8eeb6c925e862bf (patch)
tree	a4767e939dd4a33e2d3eda23696ffcf03f55c4fa /docs/mllib-data-types.md
parent	519cf6d3f764a977770266784d6902fe205a070f (diff)
download	spark-34dcf10104460816382908b2b8eeb6c925e862bf.tar.gz spark-34dcf10104460816382908b2b8eeb6c925e862bf.tar.bz2 spark-34dcf10104460816382908b2b8eeb6c925e862bf.zip