aboutsummaryrefslogtreecommitdiff
path: root/sql
diff options
context:
space:
mode:
authorAndrew Ray <ray.andrew@gmail.com>2016-12-15 23:32:10 -0800
committerAnkur Dave <ankurdave@gmail.com>2016-12-15 23:32:10 -0800
commit78062b8521bb02900baeec31992d697fa677f122 (patch)
tree74f04bb87417c6e10280fd839295295ddf1f0f90 /sql
parent172a52f5d31337d90155feb7072381e8d5712288 (diff)
downloadspark-78062b8521bb02900baeec31992d697fa677f122.tar.gz
spark-78062b8521bb02900baeec31992d697fa677f122.tar.bz2
spark-78062b8521bb02900baeec31992d697fa677f122.zip
[SPARK-18845][GRAPHX] PageRank has incorrect initialization value that leads to slow convergence
## What changes were proposed in this pull request? Change the initial value in all PageRank implementations to be `1.0` instead of `resetProb` (default `0.15`) and use `outerJoinVertices` instead of `joinVertices` so that source vertices get updated in each iteration. This seems to have been introduced a long time ago in https://github.com/apache/spark/commit/15a564598fe63003652b1e24527c432080b5976c#diff-b2bf3f97dcd2f19d61c921836159cda9L90 With the exception of graphs with sinks (which currently give incorrect results see SPARK-18847) this gives faster convergence as the sum of ranks is already correct (sum of ranks should be number of vertices). Convergence comparision benchmark for small graph: http://imgur.com/a/HkkZf Code for benchmark: https://gist.github.com/aray/a7de1f3801a810f8b1fa00c271a1fefd ## How was this patch tested? (corrected) existing unit tests and additional test that verifies against result of igraph and NetworkX on a loop with a source. Author: Andrew Ray <ray.andrew@gmail.com> Closes #16271 from aray/pagerank-initial-value.
Diffstat (limited to 'sql')
0 files changed, 0 insertions, 0 deletions