aboutsummaryrefslogtreecommitdiff
path: root/graphx
diff options
context:
space:
mode:
authorkj-ki <kikushima.kenji@lab.ntt.co.jp>2015-01-06 09:49:37 -0800
committerAnkur Dave <ankurdave@gmail.com>2015-01-06 09:49:37 -0800
commit5e3ec1110495899a298313c4aa9c6c151c1f54da (patch)
treeb7594e143573bc7facb4af3022fc461c71e81558 /graphx
parenta6394bc2c094c6c662237236c2effa2dabe67910 (diff)
downloadspark-5e3ec1110495899a298313c4aa9c6c151c1f54da.tar.gz
spark-5e3ec1110495899a298313c4aa9c6c151c1f54da.tar.bz2
spark-5e3ec1110495899a298313c4aa9c6c151c1f54da.zip
[Minor] Fix comments for GraphX 2D partitioning strategy
The sum of vertices on matrix (v0 to v11) is 12. And, I think one same block overlaps in this strategy. This is minor PR, so I didn't file in JIRA. Author: kj-ki <kikushima.kenji@lab.ntt.co.jp> Closes #3904 from kj-ki/fix-partitionstrategy-comments and squashes the following commits: 79829d9 [kj-ki] Fix comments for 2D partitioning.
Diffstat (limited to 'graphx')
-rw-r--r--graphx/src/main/scala/org/apache/spark/graphx/PartitionStrategy.scala6
1 files changed, 3 insertions, 3 deletions
diff --git a/graphx/src/main/scala/org/apache/spark/graphx/PartitionStrategy.scala b/graphx/src/main/scala/org/apache/spark/graphx/PartitionStrategy.scala
index 13033fee0e..7372dfbd9f 100644
--- a/graphx/src/main/scala/org/apache/spark/graphx/PartitionStrategy.scala
+++ b/graphx/src/main/scala/org/apache/spark/graphx/PartitionStrategy.scala
@@ -32,9 +32,9 @@ trait PartitionStrategy extends Serializable {
object PartitionStrategy {
/**
* Assigns edges to partitions using a 2D partitioning of the sparse edge adjacency matrix,
- * guaranteeing a `2 * sqrt(numParts)` bound on vertex replication.
+ * guaranteeing a `2 * sqrt(numParts) - 1` bound on vertex replication.
*
- * Suppose we have a graph with 11 vertices that we want to partition
+ * Suppose we have a graph with 12 vertices that we want to partition
* over 9 machines. We can use the following sparse matrix representation:
*
* <pre>
@@ -61,7 +61,7 @@ object PartitionStrategy {
* that edges adjacent to `v11` can only be in the first column of blocks `(P0, P3,
* P6)` or the last
* row of blocks `(P6, P7, P8)`. As a consequence we can guarantee that `v11` will need to be
- * replicated to at most `2 * sqrt(numParts)` machines.
+ * replicated to at most `2 * sqrt(numParts) - 1` machines.
*
* Notice that `P0` has many edges and as a consequence this partitioning would lead to poor work
* balance. To improve balance we first multiply each vertex id by a large prime to shuffle the