Finished second pass on pregel docs.

author: Joseph E. Gonzalez <joseph.e.gonzalez@gmail.com> 2014-01-13 18:40:35 -0800
committer: Joseph E. Gonzalez <joseph.e.gonzalez@gmail.com> 2014-01-13 18:40:43 -0800
commit: 552de5d42e395bad19f5d5fe6dcc1e678bb994a8 (patch)
tree: edaa2bcc04099bbf1b6139c42f1515975fc0849a /docs/graphx-programming-guide.md
parent: 622b7f7d391375cced8633e4a2546dbca60a3907 (diff)
download: spark-552de5d42e395bad19f5d5fe6dcc1e678bb994a8.tar.gz
spark-552de5d42e395bad19f5d5fe6dcc1e678bb994a8.tar.bz2
spark-552de5d42e395bad19f5d5fe6dcc1e678bb994a8.zip
1 files changed, 33 insertions, 12 deletions
diff --git a/docs/graphx-programming-guide.md b/docs/graphx-programming-guide.md
index c6505d21f1..77d807874f 100644
--- a/docs/graphx-programming-guide.md
+++ b/docs/graphx-programming-guide.md
@@ -484,10 +484,28 @@ messages destined to each vertex.  The `mapReduceTriplets` operator returns a `V
 containing the aggregate message (of type `A`) destined to each vertex.  Vertices that do not
 receive a message are not included in the returned `VertexRDD`.
 
-> Note that `mapReduceTriplets` takes an additional optional `activeSet` (see API docs) which
-> restricts the map phase to edges adjacent to the vertices in the provided `VertexRDD`. Restricting
-> computation to triplets adjacent to a subset of the vertices is often necessary in incremental
-> iterative computation and is a key part of the GraphX implementation of Pregel.
+<blockquote>
+<p>
+Note that <code>mapReduceTriplets</code> takes an additional optional <code>activeSet</code>
+(see API docs) which restricts the map phase to edges adjacent to the vertices in the provided
+<code>VertexRDD</code>:
+</p>
+{% highlight scala %}
+  activeSetOpt: Option[(VertexRDD[_], EdgeDirection)] = None
+{% endhighlight %}
+<p>
+The EdgeDirection specifies which edges adjacent to the vertex set are included in the map phase. If
+the direction is <code>In</code>, <code>mapFunc</code> will only be run only on edges with
+destination in the active set. If the direction is <code>Out</code>, <code>mapFunc</code> will only
+be run only on edges originating from vertices in the active set.  If the direction is
+<code>Either</code>, <code>mapFunc</code> will be run only on edges with <i>either</i> vertex in the
+active set.  If the direction is <code>Both</code>, <code>mapFunc</code> will be run only on edges
+with both vertices in the active set.  The active set must be derived from the set of vertices in
+the graph. Restricting computation to triplets adjacent to a subset of the vertices is often
+necessary in incremental iterative computation and is a key part of the GraphX implementation of
+Pregel.
+</p>
+</blockquote>
 
 In the following example we use the `mapReduceTriplets` operator to compute the average age of the
 more senior followers of each user.
@@ -565,15 +583,18 @@ Graphs are inherently recursive data-structures as properties of vertices depend
 their neighbors which intern depend on properties of *their* neighbors.  As a
 consequence many important graph algorithms iteratively recompute the properties of each vertex
 until a fixed-point condition is reached.  A range of graph-parallel abstractions have been proposed
-to express these iterative algorithms.  GraphX exposes a Pregel operator which is a fusion of
+to express these iterative algorithms.  GraphX exposes a Pregel-like operator which is a fusion of
 the widely used Pregel and GraphLab abstractions.
 
-At a high-level the GraphX variant of the Pregel abstraction is a bulk-synchronous parallel
-messaging abstraction constrained to the topology of the graph.  The Pregel operator executes in a
-series of super-steps in which vertices receive the sum of their inbound messages from the previous
-super-step, compute a new property value, and then send messages to neighboring vertices in the next
-super-step.  Vertices that do not receive a message are skipped within a super-step.  The Pregel
-operators terminates iteration and returns the final graph when there are no messages remaining.
+At a high-level the Pregel operator in GraphX is a bulk-synchronous parallel messaging abstraction
+*constrained to the topology of the graph*.  The Pregel operator executes in a series of super-steps
+in which vertices receive the *sum* of their inbound messages from the previous super- step, compute
+a new value for the vertex property, and then send messages to neighboring vertices in the next
+super-step.  Unlike Pregel and instead more like GraphLab messages are computed in parallel as a
+function of the edge triplet and the message computation has access to both the source and
+destination vertex attributes.  Vertices that do not receive a message are skipped within a super-
+step.  The Pregel operators terminates iteration and returns the final graph when there are no
+messages remaining.
 
 > Note, unlike more standard Pregel implementations, vertices in GraphX can only send messages to
 > neighboring vertices and the message construction is done in parallel using a user defined
@@ -588,7 +609,7 @@ def pregel[A]
      maxIter: Int = Int.MaxValue,
      activeDir: EdgeDirection = EdgeDirection.Out)
     (vprog: (VertexID, VD, A) => VD,
-     sendMsg: EdgeTriplet[VD, ED] => Iterator[(VertexID,A)],
+     sendMsg: EdgeTriplet[VD, ED] => Iterator[(VertexID, A)],
      mergeMsg: (A, A) => A)
   : Graph[VD, ED] = {
   // Receive the initial message at each vertex
author	Joseph E. Gonzalez <joseph.e.gonzalez@gmail.com>	2014-01-13 18:40:35 -0800
committer	Joseph E. Gonzalez <joseph.e.gonzalez@gmail.com>	2014-01-13 18:40:43 -0800
commit	552de5d42e395bad19f5d5fe6dcc1e678bb994a8 (patch)
tree	edaa2bcc04099bbf1b6139c42f1515975fc0849a /docs/graphx-programming-guide.md
parent	622b7f7d391375cced8633e4a2546dbca60a3907 (diff)
download	spark-552de5d42e395bad19f5d5fe6dcc1e678bb994a8.tar.gz spark-552de5d42e395bad19f5d5fe6dcc1e678bb994a8.tar.bz2 spark-552de5d42e395bad19f5d5fe6dcc1e678bb994a8.zip