aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorAnkur Dave <ankurdave@gmail.com>2014-01-10 11:37:10 -0800
committerAnkur Dave <ankurdave@gmail.com>2014-01-10 11:37:10 -0800
commit6bd9a78e78d42dc5c216af4b6f59a71a002f82e5 (patch)
treedee127765fa8429478641eb5f25f64a4038b2ce8
parentcfc10c74a33cfd0997f53cb37053fd69193ee790 (diff)
downloadspark-6bd9a78e78d42dc5c216af4b6f59a71a002f82e5.tar.gz
spark-6bd9a78e78d42dc5c216af4b6f59a71a002f82e5.tar.bz2
spark-6bd9a78e78d42dc5c216af4b6f59a71a002f82e5.zip
Add back Bagel links to docs, but mark them superseded
-rwxr-xr-xdocs/_layouts/global.html4
-rw-r--r--docs/api.md3
-rw-r--r--docs/bagel-programming-guide.md10
-rw-r--r--docs/graphx-programming-guide.md14
-rw-r--r--docs/index.md4
5 files changed, 21 insertions, 14 deletions
diff --git a/docs/_layouts/global.html b/docs/_layouts/global.html
index 7721854685..36eb49df14 100755
--- a/docs/_layouts/global.html
+++ b/docs/_layouts/global.html
@@ -67,6 +67,7 @@
<li class="divider"></li>
<li><a href="streaming-programming-guide.html">Spark Streaming</a></li>
<li><a href="mllib-guide.html">MLlib (Machine Learning)</a></li>
+ <li><a href="bagel-programming-guide.html">Bagel (Pregel on Spark, superseded by GraphX)</a></li>
<li><a href="graphx-programming-guide.html">GraphX (Graph-Parallel Spark)</a></li>
</ul>
</li>
@@ -79,7 +80,8 @@
<li class="divider"></li>
<li><a href="api/streaming/index.html#org.apache.spark.streaming.package">Spark Streaming</a></li>
<li><a href="api/mllib/index.html#org.apache.spark.mllib.package">MLlib (Machine Learning)</a></li>
- <li><a href="api/graphx/index.html#org.apache.spark.graphx.package">GraphX (Graph-Paralle Spark)</a></li>
+ <li><a href="api/bagel/index.html#org.apache.spark.bagel.package">Bagel (Pregel on Spark, superseded by GraphX)</a></li>
+ <li><a href="api/graphx/index.html#org.apache.spark.graphx.package">GraphX (Graph-Parallel Spark)</a></li>
</ul>
</li>
diff --git a/docs/api.md b/docs/api.md
index e86d07770a..7639e58053 100644
--- a/docs/api.md
+++ b/docs/api.md
@@ -8,5 +8,6 @@ Here you can find links to the Scaladoc generated for the Spark sbt subprojects.
- [Spark](api/core/index.html)
- [Spark Examples](api/examples/index.html)
- [Spark Streaming](api/streaming/index.html)
-- [Bagel](api/bagel/index.html)
+- [Bagel](api/bagel/index.html) *(superseded by GraphX)*
+- [GraphX](api/graphx/index.html)
- [PySpark](api/pyspark/index.html)
diff --git a/docs/bagel-programming-guide.md b/docs/bagel-programming-guide.md
index c4f1f6d6ad..a1339ec735 100644
--- a/docs/bagel-programming-guide.md
+++ b/docs/bagel-programming-guide.md
@@ -3,6 +3,8 @@ layout: global
title: Bagel Programming Guide
---
+**Bagel has been superseded by [GraphX](graphx-programming-guide.html) for graph processing. New users should use GraphX instead.**
+
Bagel is a Spark implementation of Google's [Pregel](http://portal.acm.org/citation.cfm?id=1807184) graph processing framework. Bagel currently supports basic graph computation, combiners, and aggregators.
In the Pregel programming model, jobs run as a sequence of iterations called _supersteps_. In each superstep, each vertex in the graph runs a user-specified function that can update state associated with the vertex and send messages to other vertices for use in the *next* iteration.
@@ -21,7 +23,7 @@ To use Bagel in your program, add the following SBT or Maven dependency:
Bagel operates on a graph represented as a [distributed dataset](scala-programming-guide.html) of (K, V) pairs, where keys are vertex IDs and values are vertices plus their associated state. In each superstep, Bagel runs a user-specified compute function on each vertex that takes as input the current vertex state and a list of messages sent to that vertex during the previous superstep, and returns the new vertex state and a list of outgoing messages.
-For example, we can use Bagel to implement PageRank. Here, vertices represent pages, edges represent links between pages, and messages represent shares of PageRank sent to the pages that a particular page links to.
+For example, we can use Bagel to implement PageRank. Here, vertices represent pages, edges represent links between pages, and messages represent shares of PageRank sent to the pages that a particular page links to.
We first extend the default `Vertex` class to store a `Double`
representing the current PageRank of the vertex, and similarly extend
@@ -38,7 +40,7 @@ import org.apache.spark.bagel.Bagel._
val active: Boolean) extends Vertex
@serializable class PRMessage(
- val targetId: String, val rankShare: Double) extends Message
+ val targetId: String, val rankShare: Double) extends Message
{% endhighlight %}
Next, we load a sample graph from a text file as a distributed dataset and package it into `PRVertex` objects. We also cache the distributed dataset because Bagel will use it multiple times and we'd like to avoid recomputing it.
@@ -114,7 +116,7 @@ Here are the actions and types in the Bagel API. See [Bagel.scala](https://githu
/*** Full form ***/
Bagel.run(sc, vertices, messages, combiner, aggregator, partitioner, numSplits)(compute)
-// where compute takes (vertex: V, combinedMessages: Option[C], aggregated: Option[A], superstep: Int)
+// where compute takes (vertex: V, combinedMessages: Option[C], aggregated: Option[A], superstep: Int)
// and returns (newVertex: V, outMessages: Array[M])
/*** Abbreviated forms ***/
@@ -124,7 +126,7 @@ Bagel.run(sc, vertices, messages, combiner, partitioner, numSplits)(compute)
// and returns (newVertex: V, outMessages: Array[M])
Bagel.run(sc, vertices, messages, combiner, numSplits)(compute)
-// where compute takes (vertex: V, combinedMessages: Option[C], superstep: Int)
+// where compute takes (vertex: V, combinedMessages: Option[C], superstep: Int)
// and returns (newVertex: V, outMessages: Array[M])
Bagel.run(sc, vertices, messages, numSplits)(compute)
diff --git a/docs/graphx-programming-guide.md b/docs/graphx-programming-guide.md
index a551e4306d..8ae5f17e12 100644
--- a/docs/graphx-programming-guide.md
+++ b/docs/graphx-programming-guide.md
@@ -16,7 +16,7 @@ title: GraphX Programming Guide
# Overview
GraphX is the new (alpha) Spark API for graphs and graph-parallel
-computation. At a high-level GraphX, extends the Spark
+computation. At a high-level, GraphX extends the Spark
[RDD](api/core/index.html#org.apache.spark.rdd.RDD) by
introducing the [Resilient Distributed property Graph (RDG)](#property_graph):
a directed graph with properties attached to each vertex and edge.
@@ -77,12 +77,13 @@ graph-parallel systems while easily expressing the entire analytics pipelines.
## GraphX Replaces the Spark Bagel API
Prior to the release of GraphX, graph computation in Spark was expressed using
-Bagel, an implementation of the Pregel API. GraphX improves upon Bagel by exposing
-a richer property graph API, a more streamlined version of the Pregel abstraction,
-and system optimizations to improve performance and reduce memory
+Bagel, an implementation of the Pregel API. GraphX improves upon Bagel by
+exposing a richer property graph API, a more streamlined version of the Pregel
+abstraction, and system optimizations to improve performance and reduce memory
overhead. While we plan to eventually deprecate the Bagel, we will continue to
-support the API and [Bagel programming guide](bagel-programming-guide.html). However,
-we encourage Bagel to explore the new GraphX API and comment on issues that may
+support the [Bagel API](api/bagel/index.html#org.apache.spark.bagel.package) and
+[Bagel programming guide](bagel-programming-guide.html). However, we encourage
+Bagel users to explore the new GraphX API and comment on issues that may
complicate the transition from Bagel.
# The Property Graph
@@ -168,4 +169,3 @@ val userInfoWithPageRank = subgraph.outerJoinVertices(pagerankGraph.vertices){
println(userInfoWithPageRank.top(5))
{% endhighlight %}
-
diff --git a/docs/index.md b/docs/index.md
index 7228809738..c11dc38b0e 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -77,7 +77,8 @@ For this version of Spark (0.8.1) Hadoop 2.2.x (or newer) users will have to bui
* [Python Programming Guide](python-programming-guide.html): using Spark from Python
* [Spark Streaming](streaming-programming-guide.html): using the alpha release of Spark Streaming
* [MLlib (Machine Learning)](mllib-guide.html): Spark's built-in machine learning library
-* [GraphX (Graphs on Spark)](graphx-programming-guide.html): simple graph processing model
+* [Bagel (Pregel on Spark)](bagel-programming-guide.html): simple graph processing model *(superseded by GraphX)*
+* [GraphX (Graphs on Spark)](graphx-programming-guide.html): Spark's new API for graphs
**API Docs:**
@@ -85,6 +86,7 @@ For this version of Spark (0.8.1) Hadoop 2.2.x (or newer) users will have to bui
* [Spark for Python (Epydoc)](api/pyspark/index.html)
* [Spark Streaming for Java/Scala (Scaladoc)](api/streaming/index.html)
* [MLlib (Machine Learning) for Java/Scala (Scaladoc)](api/mllib/index.html)
+* [Bagel (Pregel on Spark) for Scala (Scaladoc)](api/bagel/index.html) *(superseded by GraphX)*
* [GraphX (Graphs on Spark) for Scala (Scaladoc)](api/graphx/index.html)