SPARK-1321 Use Guava's top k implementation rather than our BoundedPriorityQueue based implementation - spark

diff options

author	Reynold Xin <rxin@apache.org>	2014-03-26 00:09:44 -0700
committer	Reynold Xin <rxin@apache.org>	2014-03-26 00:09:44 -0700
commit	b859853ba47b6323af0e31a4e2099e943221e1b1 (patch)
tree	50ab4dd5357e772354c7f3d2dd5f19c47c79a630 /python
parent	4f7d547b85ed89ba4706e05d7d0984f16749120e (diff)
download	spark-b859853ba47b6323af0e31a4e2099e943221e1b1.tar.gz spark-b859853ba47b6323af0e31a4e2099e943221e1b1.tar.bz2 spark-b859853ba47b6323af0e31a4e2099e943221e1b1.zip

SPARK-1321 Use Guava's top k implementation rather than our BoundedPriorityQueue based implementation

Also updated the documentation for top and takeOrdered. On my simple test of sorting 100 million (Int, Int) tuples using Spark, Guava's top k implementation (in Ordering) is much faster than the BoundedPriorityQueue implementation for roughly sorted input (10 - 20X faster), and still faster for purely random input (2 - 5X). Author: Reynold Xin <rxin@apache.org> Closes #229 from rxin/takeOrdered and squashes the following commits: 0d11844 [Reynold Xin] Use Guava's top k implementation rather than our BoundedPriorityQueue based implementation. Also updated the documentation for top and takeOrdered.

Diffstat (limited to 'python')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: