diff options
author | Ankur Dave <ankurdave@gmail.com> | 2014-01-10 11:37:10 -0800 |
---|---|---|
committer | Ankur Dave <ankurdave@gmail.com> | 2014-01-10 11:37:10 -0800 |
commit | 6bd9a78e78d42dc5c216af4b6f59a71a002f82e5 (patch) | |
tree | dee127765fa8429478641eb5f25f64a4038b2ce8 /docs | |
parent | cfc10c74a33cfd0997f53cb37053fd69193ee790 (diff) | |
download | spark-6bd9a78e78d42dc5c216af4b6f59a71a002f82e5.tar.gz spark-6bd9a78e78d42dc5c216af4b6f59a71a002f82e5.tar.bz2 spark-6bd9a78e78d42dc5c216af4b6f59a71a002f82e5.zip |
Add back Bagel links to docs, but mark them superseded
Diffstat (limited to 'docs')
-rwxr-xr-x | docs/_layouts/global.html | 4 | ||||
-rw-r--r-- | docs/api.md | 3 | ||||
-rw-r--r-- | docs/bagel-programming-guide.md | 10 | ||||
-rw-r--r-- | docs/graphx-programming-guide.md | 14 | ||||
-rw-r--r-- | docs/index.md | 4 |
5 files changed, 21 insertions, 14 deletions
diff --git a/docs/_layouts/global.html b/docs/_layouts/global.html index 7721854685..36eb49df14 100755 --- a/docs/_layouts/global.html +++ b/docs/_layouts/global.html @@ -67,6 +67,7 @@ <li class="divider"></li> <li><a href="streaming-programming-guide.html">Spark Streaming</a></li> <li><a href="mllib-guide.html">MLlib (Machine Learning)</a></li> + <li><a href="bagel-programming-guide.html">Bagel (Pregel on Spark, superseded by GraphX)</a></li> <li><a href="graphx-programming-guide.html">GraphX (Graph-Parallel Spark)</a></li> </ul> </li> @@ -79,7 +80,8 @@ <li class="divider"></li> <li><a href="api/streaming/index.html#org.apache.spark.streaming.package">Spark Streaming</a></li> <li><a href="api/mllib/index.html#org.apache.spark.mllib.package">MLlib (Machine Learning)</a></li> - <li><a href="api/graphx/index.html#org.apache.spark.graphx.package">GraphX (Graph-Paralle Spark)</a></li> + <li><a href="api/bagel/index.html#org.apache.spark.bagel.package">Bagel (Pregel on Spark, superseded by GraphX)</a></li> + <li><a href="api/graphx/index.html#org.apache.spark.graphx.package">GraphX (Graph-Parallel Spark)</a></li> </ul> </li> diff --git a/docs/api.md b/docs/api.md index e86d07770a..7639e58053 100644 --- a/docs/api.md +++ b/docs/api.md @@ -8,5 +8,6 @@ Here you can find links to the Scaladoc generated for the Spark sbt subprojects. - [Spark](api/core/index.html) - [Spark Examples](api/examples/index.html) - [Spark Streaming](api/streaming/index.html) -- [Bagel](api/bagel/index.html) +- [Bagel](api/bagel/index.html) *(superseded by GraphX)* +- [GraphX](api/graphx/index.html) - [PySpark](api/pyspark/index.html) diff --git a/docs/bagel-programming-guide.md b/docs/bagel-programming-guide.md index c4f1f6d6ad..a1339ec735 100644 --- a/docs/bagel-programming-guide.md +++ b/docs/bagel-programming-guide.md @@ -3,6 +3,8 @@ layout: global title: Bagel Programming Guide --- +**Bagel has been superseded by [GraphX](graphx-programming-guide.html) for graph processing. New users should use GraphX instead.** + Bagel is a Spark implementation of Google's [Pregel](http://portal.acm.org/citation.cfm?id=1807184) graph processing framework. Bagel currently supports basic graph computation, combiners, and aggregators. In the Pregel programming model, jobs run as a sequence of iterations called _supersteps_. In each superstep, each vertex in the graph runs a user-specified function that can update state associated with the vertex and send messages to other vertices for use in the *next* iteration. @@ -21,7 +23,7 @@ To use Bagel in your program, add the following SBT or Maven dependency: Bagel operates on a graph represented as a [distributed dataset](scala-programming-guide.html) of (K, V) pairs, where keys are vertex IDs and values are vertices plus their associated state. In each superstep, Bagel runs a user-specified compute function on each vertex that takes as input the current vertex state and a list of messages sent to that vertex during the previous superstep, and returns the new vertex state and a list of outgoing messages. -For example, we can use Bagel to implement PageRank. Here, vertices represent pages, edges represent links between pages, and messages represent shares of PageRank sent to the pages that a particular page links to. +For example, we can use Bagel to implement PageRank. Here, vertices represent pages, edges represent links between pages, and messages represent shares of PageRank sent to the pages that a particular page links to. We first extend the default `Vertex` class to store a `Double` representing the current PageRank of the vertex, and similarly extend @@ -38,7 +40,7 @@ import org.apache.spark.bagel.Bagel._ val active: Boolean) extends Vertex @serializable class PRMessage( - val targetId: String, val rankShare: Double) extends Message + val targetId: String, val rankShare: Double) extends Message {% endhighlight %} Next, we load a sample graph from a text file as a distributed dataset and package it into `PRVertex` objects. We also cache the distributed dataset because Bagel will use it multiple times and we'd like to avoid recomputing it. @@ -114,7 +116,7 @@ Here are the actions and types in the Bagel API. See [Bagel.scala](https://githu /*** Full form ***/ Bagel.run(sc, vertices, messages, combiner, aggregator, partitioner, numSplits)(compute) -// where compute takes (vertex: V, combinedMessages: Option[C], aggregated: Option[A], superstep: Int) +// where compute takes (vertex: V, combinedMessages: Option[C], aggregated: Option[A], superstep: Int) // and returns (newVertex: V, outMessages: Array[M]) /*** Abbreviated forms ***/ @@ -124,7 +126,7 @@ Bagel.run(sc, vertices, messages, combiner, partitioner, numSplits)(compute) // and returns (newVertex: V, outMessages: Array[M]) Bagel.run(sc, vertices, messages, combiner, numSplits)(compute) -// where compute takes (vertex: V, combinedMessages: Option[C], superstep: Int) +// where compute takes (vertex: V, combinedMessages: Option[C], superstep: Int) // and returns (newVertex: V, outMessages: Array[M]) Bagel.run(sc, vertices, messages, numSplits)(compute) diff --git a/docs/graphx-programming-guide.md b/docs/graphx-programming-guide.md index a551e4306d..8ae5f17e12 100644 --- a/docs/graphx-programming-guide.md +++ b/docs/graphx-programming-guide.md @@ -16,7 +16,7 @@ title: GraphX Programming Guide # Overview GraphX is the new (alpha) Spark API for graphs and graph-parallel -computation. At a high-level GraphX, extends the Spark +computation. At a high-level, GraphX extends the Spark [RDD](api/core/index.html#org.apache.spark.rdd.RDD) by introducing the [Resilient Distributed property Graph (RDG)](#property_graph): a directed graph with properties attached to each vertex and edge. @@ -77,12 +77,13 @@ graph-parallel systems while easily expressing the entire analytics pipelines. ## GraphX Replaces the Spark Bagel API Prior to the release of GraphX, graph computation in Spark was expressed using -Bagel, an implementation of the Pregel API. GraphX improves upon Bagel by exposing -a richer property graph API, a more streamlined version of the Pregel abstraction, -and system optimizations to improve performance and reduce memory +Bagel, an implementation of the Pregel API. GraphX improves upon Bagel by +exposing a richer property graph API, a more streamlined version of the Pregel +abstraction, and system optimizations to improve performance and reduce memory overhead. While we plan to eventually deprecate the Bagel, we will continue to -support the API and [Bagel programming guide](bagel-programming-guide.html). However, -we encourage Bagel to explore the new GraphX API and comment on issues that may +support the [Bagel API](api/bagel/index.html#org.apache.spark.bagel.package) and +[Bagel programming guide](bagel-programming-guide.html). However, we encourage +Bagel users to explore the new GraphX API and comment on issues that may complicate the transition from Bagel. # The Property Graph @@ -168,4 +169,3 @@ val userInfoWithPageRank = subgraph.outerJoinVertices(pagerankGraph.vertices){ println(userInfoWithPageRank.top(5)) {% endhighlight %} - diff --git a/docs/index.md b/docs/index.md index 7228809738..c11dc38b0e 100644 --- a/docs/index.md +++ b/docs/index.md @@ -77,7 +77,8 @@ For this version of Spark (0.8.1) Hadoop 2.2.x (or newer) users will have to bui * [Python Programming Guide](python-programming-guide.html): using Spark from Python * [Spark Streaming](streaming-programming-guide.html): using the alpha release of Spark Streaming * [MLlib (Machine Learning)](mllib-guide.html): Spark's built-in machine learning library -* [GraphX (Graphs on Spark)](graphx-programming-guide.html): simple graph processing model +* [Bagel (Pregel on Spark)](bagel-programming-guide.html): simple graph processing model *(superseded by GraphX)* +* [GraphX (Graphs on Spark)](graphx-programming-guide.html): Spark's new API for graphs **API Docs:** @@ -85,6 +86,7 @@ For this version of Spark (0.8.1) Hadoop 2.2.x (or newer) users will have to bui * [Spark for Python (Epydoc)](api/pyspark/index.html) * [Spark Streaming for Java/Scala (Scaladoc)](api/streaming/index.html) * [MLlib (Machine Learning) for Java/Scala (Scaladoc)](api/mllib/index.html) +* [Bagel (Pregel on Spark) for Scala (Scaladoc)](api/bagel/index.html) *(superseded by GraphX)* * [GraphX (Graphs on Spark) for Scala (Scaladoc)](api/graphx/index.html) |