aboutsummaryrefslogtreecommitdiff
path: root/docs/mllib-data-types.md
diff options
context:
space:
mode:
authorMike Dusenberry <mwdusenb@us.ibm.com>2015-08-05 07:40:50 -0700
committerXiangrui Meng <meng@databricks.com>2015-08-05 07:40:50 -0700
commit34dcf10104460816382908b2b8eeb6c925e862bf (patch)
treea4767e939dd4a33e2d3eda23696ffcf03f55c4fa /docs/mllib-data-types.md
parent519cf6d3f764a977770266784d6902fe205a070f (diff)
downloadspark-34dcf10104460816382908b2b8eeb6c925e862bf.tar.gz
spark-34dcf10104460816382908b2b8eeb6c925e862bf.tar.bz2
spark-34dcf10104460816382908b2b8eeb6c925e862bf.zip
[SPARK-6486] [MLLIB] [PYTHON] Add BlockMatrix to PySpark.
mengxr This adds the `BlockMatrix` to PySpark. I have the conversions to `IndexedRowMatrix` and `CoordinateMatrix` ready as well, so once PR #7554 is completed (which relies on PR #7746), this PR can be finished. Author: Mike Dusenberry <mwdusenb@us.ibm.com> Closes #7761 from dusenberrymw/SPARK-6486_Add_BlockMatrix_to_PySpark and squashes the following commits: 27195c2 [Mike Dusenberry] Adding one more check to _convert_to_matrix_block_tuple, and a few minor documentation changes. ae50883 [Mike Dusenberry] Minor update: BlockMatrix should inherit from DistributedMatrix. b8acc1c [Mike Dusenberry] Moving BlockMatrix to pyspark.mllib.linalg.distributed, updating the logic to match that of the other distributed matrices, adding conversions, and adding documentation. c014002 [Mike Dusenberry] Using properties for better documentation. 3bda6ab [Mike Dusenberry] Adding documentation. 8fb3095 [Mike Dusenberry] Small cleanup. e17af2e [Mike Dusenberry] Adding BlockMatrix to PySpark.
Diffstat (limited to 'docs/mllib-data-types.md')
-rw-r--r--docs/mllib-data-types.md41
1 files changed, 41 insertions, 0 deletions
diff --git a/docs/mllib-data-types.md b/docs/mllib-data-types.md
index 11033bf4f9..f0e8d54956 100644
--- a/docs/mllib-data-types.md
+++ b/docs/mllib-data-types.md
@@ -494,6 +494,9 @@ rowMat = mat.toRowMatrix()
# Convert to a CoordinateMatrix.
coordinateMat = mat.toCoordinateMatrix()
+
+# Convert to a BlockMatrix.
+blockMat = mat.toBlockMatrix()
{% endhighlight %}
</div>
@@ -594,6 +597,9 @@ rowMat = mat.toRowMatrix()
# Convert to an IndexedRowMatrix.
indexedRowMat = mat.toIndexedRowMatrix()
+
+# Convert to a BlockMatrix.
+blockMat = mat.toBlockMatrix()
{% endhighlight %}
</div>
@@ -661,4 +667,39 @@ matA.validate();
BlockMatrix ata = matA.transpose().multiply(matA);
{% endhighlight %}
</div>
+
+<div data-lang="python" markdown="1">
+
+A [`BlockMatrix`](api/python/pyspark.mllib.html#pyspark.mllib.linalg.distributed.BlockMatrix)
+can be created from an `RDD` of sub-matrix blocks, where a sub-matrix block is a
+`((blockRowIndex, blockColIndex), sub-matrix)` tuple.
+
+{% highlight python %}
+from pyspark.mllib.linalg import Matrices
+from pyspark.mllib.linalg.distributed import BlockMatrix
+
+# Create an RDD of sub-matrix blocks.
+blocks = sc.parallelize([((0, 0), Matrices.dense(3, 2, [1, 2, 3, 4, 5, 6])),
+ ((1, 0), Matrices.dense(3, 2, [7, 8, 9, 10, 11, 12]))])
+
+# Create a BlockMatrix from an RDD of sub-matrix blocks.
+mat = BlockMatrix(blocks, 3, 2)
+
+# Get its size.
+m = mat.numRows() # 6
+n = mat.numCols() # 2
+
+# Get the blocks as an RDD of sub-matrix blocks.
+blocksRDD = mat.blocks
+
+# Convert to a LocalMatrix.
+localMat = mat.toLocalMatrix()
+
+# Convert to an IndexedRowMatrix.
+indexedRowMat = mat.toIndexedRowMatrix()
+
+# Convert to a CoordinateMatrix.
+coordinateMat = mat.toCoordinateMatrix()
+{% endhighlight %}
+</div>
</div>