diff options
author | Reynold Xin <rxin@apache.org> | 2014-03-26 00:09:44 -0700 |
---|---|---|
committer | Reynold Xin <rxin@apache.org> | 2014-03-26 00:09:44 -0700 |
commit | b859853ba47b6323af0e31a4e2099e943221e1b1 (patch) | |
tree | 50ab4dd5357e772354c7f3d2dd5f19c47c79a630 /dev | |
parent | 4f7d547b85ed89ba4706e05d7d0984f16749120e (diff) | |
download | spark-b859853ba47b6323af0e31a4e2099e943221e1b1.tar.gz spark-b859853ba47b6323af0e31a4e2099e943221e1b1.tar.bz2 spark-b859853ba47b6323af0e31a4e2099e943221e1b1.zip |
SPARK-1321 Use Guava's top k implementation rather than our BoundedPriorityQueue based implementation
Also updated the documentation for top and takeOrdered.
On my simple test of sorting 100 million (Int, Int) tuples using Spark, Guava's top k implementation (in Ordering) is much faster than the BoundedPriorityQueue implementation for roughly sorted input (10 - 20X faster), and still faster for purely random input (2 - 5X).
Author: Reynold Xin <rxin@apache.org>
Closes #229 from rxin/takeOrdered and squashes the following commits:
0d11844 [Reynold Xin] Use Guava's top k implementation rather than our BoundedPriorityQueue based implementation. Also updated the documentation for top and takeOrdered.
Diffstat (limited to 'dev')
0 files changed, 0 insertions, 0 deletions