Adding code example

author: Joey <joseph.e.gonzalez@gmail.com> 2013-10-29 21:31:12 -0700
committer: Joey <joseph.e.gonzalez@gmail.com> 2013-10-29 21:31:12 -0700
commit: 4f63b5e17f60c8b8d87027a91274428007d65263 (patch)
tree: 81b2052e91b7109e88e8f9e2368201b3612a5a38 /README.md
parent: 1a20ba9b70f3a920c46c637c6dacda2efedf3cd0 (diff)
download: spark-4f63b5e17f60c8b8d87027a91274428007d65263.tar.gz
spark-4f63b5e17f60c8b8d87027a91274428007d65263.tar.bz2
spark-4f63b5e17f60c8b8d87027a91274428007d65263.zip
1 files changed, 42 insertions, 1 deletions
diff --git a/README.md b/README.md
index a68c690b1b..ba31ed586d 100644
--- a/README.md
+++ b/README.md
@@ -1,4 +1,4 @@
-# GraphX: Unifying Graph and Tables
+# GraphX: Unifying Graphs and Tables
 
 
 GraphX extends the distributed fault-tolerant collections API and
@@ -50,6 +50,47 @@ to interactively load, transform, and compute on massive graphs.
   <img src="https://raw.github.com/jegonzal/graphx/Documentation/docs/img/tables_and_graphs.png" />
 </p>
 
+## Examples
+
+Suppose I want to build a graph from some text files, restrict the graph 
+to important relationships and users, run page-rank on the sub-graph, and
+then finally return attributes associated with the top users.  I can do 
+all of this in just a few lines with GraphX:
+
+```scala
+// Connect to the Spark cluster
+val sc = new SparkContext("spark://master.amplab.org", "research")
+
+// Load my user data and prase into tuples of user id and attribute list
+val users = sc.textFile("hdfs://user_attributes.tsv")
+  .map(line => line.split).map( parts => (parts.head, parts.tail) )
+
+// Parse the edge data which is already in userId -> userId format
+val followerGraph = Graph.textFile(sc, "hdfs://followers.tsv")
+
+// Attach the user attributes
+val graph = followerGraph.outerJoinVertices(users){ 
+  case (uid, deg, Some(attrList)) => attrList
+  // Some users may not have attributes so we set them as empty
+  case (uid, deg, None) => Array.empty[String] 
+  }
+
+// Restrict the graph to users which have exactly two attributes
+val subgraph = graph.subgraph((vid, attr) => attr.size == 2)
+
+// Compute the PageRank 
+val pagerankGraph = Analytics.pagerank(subgraph)
+
+// Get the attributes of the top pagerank users
+val userInfoWithPageRank = subgraph.outerJoinVertices(pagerankGraph.vertices){
+  case (uid, attrList, Some(pr)) => (pr, attrList)
+  case (uid, attrList, None) => (pr, attrList)
+  }
+  
+println(userInfoWithPageRank.top(5))
+
+```
+
 
 ## Online Documentation
author	Joey <joseph.e.gonzalez@gmail.com>	2013-10-29 21:31:12 -0700
committer	Joey <joseph.e.gonzalez@gmail.com>	2013-10-29 21:31:12 -0700
commit	4f63b5e17f60c8b8d87027a91274428007d65263 (patch)
tree	81b2052e91b7109e88e8f9e2368201b3612a5a38 /README.md
parent	1a20ba9b70f3a920c46c637c6dacda2efedf3cd0 (diff)
download	spark-4f63b5e17f60c8b8d87027a91274428007d65263.tar.gz spark-4f63b5e17f60c8b8d87027a91274428007d65263.tar.bz2 spark-4f63b5e17f60c8b8d87027a91274428007d65263.zip