aboutsummaryrefslogtreecommitdiff
path: root/core/pom.xml
diff options
context:
space:
mode:
authorAnkur Dave <ankurdave@gmail.com>2014-09-12 14:08:38 -0700
committerReynold Xin <rxin@apache.org>2014-09-12 14:08:38 -0700
commit15a564598fe63003652b1e24527c432080b5976c (patch)
treef22e278781547cd00715814b556b7143ec5a8d55 /core/pom.xml
parenteae81b0bfdf3159be90f507a03853800aec1874a (diff)
downloadspark-15a564598fe63003652b1e24527c432080b5976c.tar.gz
spark-15a564598fe63003652b1e24527c432080b5976c.tar.bz2
spark-15a564598fe63003652b1e24527c432080b5976c.zip
[SPARK-3427] [GraphX] Avoid active vertex tracking in static PageRank
GraphX's current implementation of static (fixed iteration count) PageRank uses the Pregel API. This unnecessarily tracks active vertices, even though in static PageRank all vertices are always active. Active vertex tracking incurs the following costs: 1. A shuffle per iteration to ship the active sets to the edge partitions. 2. A hash table creation per iteration at each partition to index the active sets for lookup. 3. A hash lookup per edge to check whether the source vertex is active. I reimplemented static PageRank using the lower-level GraphX API instead of the Pregel API. In benchmarks on a 16-node m2.4xlarge cluster, this provided a 23% speedup (from 514 s to 397 s, mean over 3 trials) for 10 iterations of PageRank on a synthetic graph with 10M vertices and 1.27B edges. Author: Ankur Dave <ankurdave@gmail.com> Closes #2308 from ankurdave/SPARK-3427 and squashes the following commits: 449996a [Ankur Dave] Avoid unnecessary active vertex tracking in static PageRank
Diffstat (limited to 'core/pom.xml')
0 files changed, 0 insertions, 0 deletions