aboutsummaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
authorJoey <joseph.e.gonzalez@gmail.com>2013-10-29 21:31:12 -0700
committerJoey <joseph.e.gonzalez@gmail.com>2013-10-29 21:31:12 -0700
commit4f63b5e17f60c8b8d87027a91274428007d65263 (patch)
tree81b2052e91b7109e88e8f9e2368201b3612a5a38 /README.md
parent1a20ba9b70f3a920c46c637c6dacda2efedf3cd0 (diff)
downloadspark-4f63b5e17f60c8b8d87027a91274428007d65263.tar.gz
spark-4f63b5e17f60c8b8d87027a91274428007d65263.tar.bz2
spark-4f63b5e17f60c8b8d87027a91274428007d65263.zip
Adding code example
Diffstat (limited to 'README.md')
-rw-r--r--README.md43
1 files changed, 42 insertions, 1 deletions
diff --git a/README.md b/README.md
index a68c690b1b..ba31ed586d 100644
--- a/README.md
+++ b/README.md
@@ -1,4 +1,4 @@
-# GraphX: Unifying Graph and Tables
+# GraphX: Unifying Graphs and Tables
GraphX extends the distributed fault-tolerant collections API and
@@ -50,6 +50,47 @@ to interactively load, transform, and compute on massive graphs.
<img src="https://raw.github.com/jegonzal/graphx/Documentation/docs/img/tables_and_graphs.png" />
</p>
+## Examples
+
+Suppose I want to build a graph from some text files, restrict the graph
+to important relationships and users, run page-rank on the sub-graph, and
+then finally return attributes associated with the top users. I can do
+all of this in just a few lines with GraphX:
+
+```scala
+// Connect to the Spark cluster
+val sc = new SparkContext("spark://master.amplab.org", "research")
+
+// Load my user data and prase into tuples of user id and attribute list
+val users = sc.textFile("hdfs://user_attributes.tsv")
+ .map(line => line.split).map( parts => (parts.head, parts.tail) )
+
+// Parse the edge data which is already in userId -> userId format
+val followerGraph = Graph.textFile(sc, "hdfs://followers.tsv")
+
+// Attach the user attributes
+val graph = followerGraph.outerJoinVertices(users){
+ case (uid, deg, Some(attrList)) => attrList
+ // Some users may not have attributes so we set them as empty
+ case (uid, deg, None) => Array.empty[String]
+ }
+
+// Restrict the graph to users which have exactly two attributes
+val subgraph = graph.subgraph((vid, attr) => attr.size == 2)
+
+// Compute the PageRank
+val pagerankGraph = Analytics.pagerank(subgraph)
+
+// Get the attributes of the top pagerank users
+val userInfoWithPageRank = subgraph.outerJoinVertices(pagerankGraph.vertices){
+ case (uid, attrList, Some(pr)) => (pr, attrList)
+ case (uid, attrList, None) => (pr, attrList)
+ }
+
+println(userInfoWithPageRank.top(5))
+
+```
+
## Online Documentation