aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorWeichenXu <WeichenXu123@outlook.com>2016-07-02 08:40:23 +0100
committerSean Owen <sowen@cloudera.com>2016-07-02 08:40:23 +0100
commit192d1f9cf3463d050b87422939448f2acf86acc9 (patch)
treec15ef89609de2fcf577d3eaefc078ed55fa0a511
parentbad0f7dbba2eda149ee4fc5810674d971d17874a (diff)
downloadspark-192d1f9cf3463d050b87422939448f2acf86acc9.tar.gz
spark-192d1f9cf3463d050b87422939448f2acf86acc9.tar.bz2
spark-192d1f9cf3463d050b87422939448f2acf86acc9.zip
[GRAPHX][EXAMPLES] move graphx test data directory and update graphx document
## What changes were proposed in this pull request? There are two test data files used for graphx examples existing in directory "graphx/data" I move it into "data/" directory because the "graphx" directory is used for code files and other test data files (such as mllib, streaming test data) are all in there. I also update the graphx document where reference the data files which I move place. ## How was this patch tested? N/A Author: WeichenXu <WeichenXu123@outlook.com> Closes #14010 from WeichenXu123/move_graphx_data_dir.
-rw-r--r--data/graphx/followers.txt (renamed from graphx/data/followers.txt)0
-rw-r--r--data/graphx/users.txt (renamed from graphx/data/users.txt)0
-rw-r--r--docs/graphx-programming-guide.md18
3 files changed, 9 insertions, 9 deletions
diff --git a/graphx/data/followers.txt b/data/graphx/followers.txt
index 7bb8e900e2..7bb8e900e2 100644
--- a/graphx/data/followers.txt
+++ b/data/graphx/followers.txt
diff --git a/graphx/data/users.txt b/data/graphx/users.txt
index 982d19d50b..982d19d50b 100644
--- a/graphx/data/users.txt
+++ b/data/graphx/users.txt
diff --git a/docs/graphx-programming-guide.md b/docs/graphx-programming-guide.md
index 81cf17475f..e376b6638e 100644
--- a/docs/graphx-programming-guide.md
+++ b/docs/graphx-programming-guide.md
@@ -1007,15 +1007,15 @@ PageRank measures the importance of each vertex in a graph, assuming an edge fro
GraphX comes with static and dynamic implementations of PageRank as methods on the [`PageRank` object][PageRank]. Static PageRank runs for a fixed number of iterations, while dynamic PageRank runs until the ranks converge (i.e., stop changing by more than a specified tolerance). [`GraphOps`][GraphOps] allows calling these algorithms directly as methods on `Graph`.
-GraphX also includes an example social network dataset that we can run PageRank on. A set of users is given in `graphx/data/users.txt`, and a set of relationships between users is given in `graphx/data/followers.txt`. We compute the PageRank of each user as follows:
+GraphX also includes an example social network dataset that we can run PageRank on. A set of users is given in `data/graphx/users.txt`, and a set of relationships between users is given in `data/graphx/followers.txt`. We compute the PageRank of each user as follows:
{% highlight scala %}
// Load the edges as a graph
-val graph = GraphLoader.edgeListFile(sc, "graphx/data/followers.txt")
+val graph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt")
// Run PageRank
val ranks = graph.pageRank(0.0001).vertices
// Join the ranks with the usernames
-val users = sc.textFile("graphx/data/users.txt").map { line =>
+val users = sc.textFile("data/graphx/users.txt").map { line =>
val fields = line.split(",")
(fields(0).toLong, fields(1))
}
@@ -1032,11 +1032,11 @@ The connected components algorithm labels each connected component of the graph
{% highlight scala %}
// Load the graph as in the PageRank example
-val graph = GraphLoader.edgeListFile(sc, "graphx/data/followers.txt")
+val graph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt")
// Find the connected components
val cc = graph.connectedComponents().vertices
// Join the connected components with the usernames
-val users = sc.textFile("graphx/data/users.txt").map { line =>
+val users = sc.textFile("data/graphx/users.txt").map { line =>
val fields = line.split(",")
(fields(0).toLong, fields(1))
}
@@ -1053,11 +1053,11 @@ A vertex is part of a triangle when it has two adjacent vertices with an edge be
{% highlight scala %}
// Load the edges in canonical order and partition the graph for triangle count
-val graph = GraphLoader.edgeListFile(sc, "graphx/data/followers.txt", true).partitionBy(PartitionStrategy.RandomVertexCut)
+val graph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt", true).partitionBy(PartitionStrategy.RandomVertexCut)
// Find the triangle count for each vertex
val triCounts = graph.triangleCount().vertices
// Join the triangle counts with the usernames
-val users = sc.textFile("graphx/data/users.txt").map { line =>
+val users = sc.textFile("data/graphx/users.txt").map { line =>
val fields = line.split(",")
(fields(0).toLong, fields(1))
}
@@ -1081,11 +1081,11 @@ all of this in just a few lines with GraphX:
val sc = new SparkContext("spark://master.amplab.org", "research")
// Load my user data and parse into tuples of user id and attribute list
-val users = (sc.textFile("graphx/data/users.txt")
+val users = (sc.textFile("data/graphx/users.txt")
.map(line => line.split(",")).map( parts => (parts.head.toLong, parts.tail) ))
// Parse the edge data which is already in userId -> userId format
-val followerGraph = GraphLoader.edgeListFile(sc, "graphx/data/followers.txt")
+val followerGraph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt")
// Attach the user attributes
val graph = followerGraph.outerJoinVertices(users) {