aboutsummaryrefslogtreecommitdiff
path: root/sql/core/src
diff options
context:
space:
mode:
authorAndrew Ray <ray.andrew@gmail.com>2017-03-17 14:23:07 -0700
committerReynold Xin <rxin@databricks.com>2017-03-17 14:23:07 -0700
commitbfdeea5c68f963ce60d48d0aa4a4c8c582169950 (patch)
treef9ec401b96edd0cfc5f7ad36d3c65b57ea7aeabe /sql/core/src
parent376d782164437573880f0ad58cecae1cb5f212f2 (diff)
downloadspark-bfdeea5c68f963ce60d48d0aa4a4c8c582169950.tar.gz
spark-bfdeea5c68f963ce60d48d0aa4a4c8c582169950.tar.bz2
spark-bfdeea5c68f963ce60d48d0aa4a4c8c582169950.zip
[SPARK-18847][GRAPHX] PageRank gives incorrect results for graphs with sinks
## What changes were proposed in this pull request? Graphs with sinks (vertices with no outgoing edges) don't have the expected rank sum of n (or 1 for personalized). We fix this by normalizing to the expected sum at the end of each implementation. Additionally this fixes the dynamic version of personal pagerank which gave incorrect answers that were not detected by existing unit tests. ## How was this patch tested? Revamped existing and additional unit tests with reference values (and reproduction code) from igraph and NetworkX. Note that for comparison on personal pagerank we use the arpack algorithm in igraph as prpack (the current default) redistributes rank to all vertices uniformly instead of just to the personalization source. We could take the alternate convention (redistribute rank to all vertices uniformly) but that would involve more extensive changes to the algorithms (the dynamic version would no longer be able to use Pregel). Author: Andrew Ray <ray.andrew@gmail.com> Closes #16483 from aray/pagerank-sink2.
Diffstat (limited to 'sql/core/src')
0 files changed, 0 insertions, 0 deletions