aboutsummaryrefslogtreecommitdiff
path: root/python
diff options
context:
space:
mode:
authorReynold Xin <rxin@apache.org>2014-03-26 00:09:44 -0700
committerReynold Xin <rxin@apache.org>2014-03-26 00:09:44 -0700
commitb859853ba47b6323af0e31a4e2099e943221e1b1 (patch)
tree50ab4dd5357e772354c7f3d2dd5f19c47c79a630 /python
parent4f7d547b85ed89ba4706e05d7d0984f16749120e (diff)
downloadspark-b859853ba47b6323af0e31a4e2099e943221e1b1.tar.gz
spark-b859853ba47b6323af0e31a4e2099e943221e1b1.tar.bz2
spark-b859853ba47b6323af0e31a4e2099e943221e1b1.zip
SPARK-1321 Use Guava's top k implementation rather than our BoundedPriorityQueue based implementation
Also updated the documentation for top and takeOrdered. On my simple test of sorting 100 million (Int, Int) tuples using Spark, Guava's top k implementation (in Ordering) is much faster than the BoundedPriorityQueue implementation for roughly sorted input (10 - 20X faster), and still faster for purely random input (2 - 5X). Author: Reynold Xin <rxin@apache.org> Closes #229 from rxin/takeOrdered and squashes the following commits: 0d11844 [Reynold Xin] Use Guava's top k implementation rather than our BoundedPriorityQueue based implementation. Also updated the documentation for top and takeOrdered.
Diffstat (limited to 'python')
0 files changed, 0 insertions, 0 deletions