summaryrefslogtreecommitdiff
path: root/site/docs/1.0.0/api
diff options
context:
space:
mode:
authorAnkur Dave <ankurdave@apache.org>2014-06-03 02:28:23 +0000
committerAnkur Dave <ankurdave@apache.org>2014-06-03 02:28:23 +0000
commit638088923dbfe94215c4e0edfac8beb2e7b483f8 (patch)
tree10237647e54dfa6329e7c54d459fb0464d4db9eb /site/docs/1.0.0/api
parent43588164336771f787d0d2cdf79f0d50ac828af4 (diff)
downloadspark-website-638088923dbfe94215c4e0edfac8beb2e7b483f8.tar.gz
spark-website-638088923dbfe94215c4e0edfac8beb2e7b483f8.tar.bz2
spark-website-638088923dbfe94215c4e0edfac8beb2e7b483f8.zip
Suggest workarounds for partitionBy in Spark 1.0.0 due to SPARK-1931
Applied PR #908 to the generated docs: https://github.com/apache/spark/pull/908
Diffstat (limited to 'site/docs/1.0.0/api')
-rw-r--r--site/docs/1.0.0/api/scala/org/apache/spark/graphx/Graph.html25
1 files changed, 23 insertions, 2 deletions
diff --git a/site/docs/1.0.0/api/scala/org/apache/spark/graphx/Graph.html b/site/docs/1.0.0/api/scala/org/apache/spark/graphx/Graph.html
index 0c261ea00..c04282e7f 100644
--- a/site/docs/1.0.0/api/scala/org/apache/spark/graphx/Graph.html
+++ b/site/docs/1.0.0/api/scala/org/apache/spark/graphx/Graph.html
@@ -316,7 +316,7 @@ provided for a particular vertex in the graph, the map function receives <code>N
(vid, data, optDeg) <span class="kw">=&gt;</span> optDeg.getOrElse(<span class="num">0</span>)
}</pre></li></ol>
</div></dl></div>
- </li><li name="org.apache.spark.graphx.Graph#partitionBy" visbl="pub" data-isabs="true" fullComment="no" group="Ungrouped">
+ </li><li name="org.apache.spark.graphx.Graph#partitionBy" visbl="pub" data-isabs="true" fullComment="yes" group="Ungrouped">
<a id="partitionBy(partitionStrategy:org.apache.spark.graphx.PartitionStrategy):org.apache.spark.graphx.Graph[VD,ED]"></a>
<a id="partitionBy(PartitionStrategy):Graph[VD,ED]"></a>
<h4 class="signature">
@@ -328,7 +328,28 @@ provided for a particular vertex in the graph, the map function receives <code>N
<span class="name">partitionBy</span><span class="params">(<span name="partitionStrategy">partitionStrategy: <a href="PartitionStrategy.html" class="extype" name="org.apache.spark.graphx.PartitionStrategy">PartitionStrategy</a></span>)</span><span class="result">: <a href="" class="extype" name="org.apache.spark.graphx.Graph">Graph</a>[<span class="extype" name="org.apache.spark.graphx.Graph.VD">VD</span>, <span class="extype" name="org.apache.spark.graphx.Graph.ED">ED</span>]</span>
</span>
</h4>
- <p class="shortcomment cmt">Repartitions the edges in the graph according to <code>partitionStrategy</code>.</p>
+ <p class="shortcomment cmt">Repartitions the edges in the graph according to <code>partitionStrategy</code> (WARNING: broken in
+Spark 1․0․0).</p><div class="fullcomment"><div class="comment cmt"><p>Repartitions the edges in the graph according to <code>partitionStrategy</code> (WARNING: broken in
+Spark 1․0․0).</p><p>To use this function in Spark 1.0.0, either build the latest version of Spark from the master
+branch, or apply the following workaround:</p><pre><span class="cmt">// Define our own version of partitionBy to work around SPARK-1931</span>
+<span class="kw">import</span> org.apache.spark.HashPartitioner
+<span class="kw">def</span> partitionBy[ED](
+ edges: RDD[Edge[ED]], partitionStrategy: PartitionStrategy): RDD[Edge[ED]] = {
+ <span class="kw">val</span> numPartitions = edges.partitions.size
+ edges.map(e <span class="kw">=&gt;</span> (partitionStrategy.getPartition(e.srcId, e.dstId, numPartitions), e))
+ .partitionBy(<span class="kw">new</span> HashPartitioner(numPartitions))
+ .mapPartitions(_.map(_._2), preservesPartitioning = <span class="kw">true</span>)
+}
+
+<span class="kw">val</span> vertices = ...
+<span class="kw">val</span> edges = ...
+
+<span class="cmt">// Instead of:</span>
+<span class="kw">val</span> g = Graph(vertices, edges)
+ .partitionBy(PartitionStrategy.EdgePartition2D) <span class="cmt">// broken in Spark 1.0.0</span>
+
+<span class="cmt">// Use:</span>
+<span class="kw">val</span> g = Graph(vertices, partitionBy(edges, PartitionStrategy.EdgePartition2D))</pre></div></div>
</li><li name="org.apache.spark.graphx.Graph#persist" visbl="pub" data-isabs="true" fullComment="yes" group="Ungrouped">
<a id="persist(newLevel:org.apache.spark.storage.StorageLevel):org.apache.spark.graphx.Graph[VD,ED]"></a>
<a id="persist(StorageLevel):Graph[VD,ED]"></a>