diff options
author | Davies Liu <davies.liu@gmail.com> | 2014-08-26 16:57:40 -0700 |
---|---|---|
committer | Matei Zaharia <matei@databricks.com> | 2014-08-26 16:57:40 -0700 |
commit | f1e71d4c3ba678fc108effb05cf2d6101dadc0ce (patch) | |
tree | ef5c761a9bf3a75c59b03148985a4a83e64a2c16 /.rat-excludes | |
parent | c4787a3690a9ed3b8b2c6c294fc4a6915436b6f7 (diff) | |
download | spark-f1e71d4c3ba678fc108effb05cf2d6101dadc0ce.tar.gz spark-f1e71d4c3ba678fc108effb05cf2d6101dadc0ce.tar.bz2 spark-f1e71d4c3ba678fc108effb05cf2d6101dadc0ce.zip |
[SPARK-3073] [PySpark] use external sort in sortBy() and sortByKey()
Using external sort to support sort large datasets in reduce stage.
Author: Davies Liu <davies.liu@gmail.com>
Closes #1978 from davies/sort and squashes the following commits:
bbcd9ba [Davies Liu] check spilled bytes in tests
b125d2f [Davies Liu] add test for external sort in rdd
eae0176 [Davies Liu] choose different disks from different processes and instances
1f075ed [Davies Liu] Merge branch 'master' into sort
eb53ca6 [Davies Liu] Merge branch 'master' into sort
644abaf [Davies Liu] add license in LICENSE
19f7873 [Davies Liu] improve tests
55602ee [Davies Liu] use external sort in sortBy() and sortByKey()
Diffstat (limited to '.rat-excludes')
-rw-r--r-- | .rat-excludes | 1 |
1 files changed, 1 insertions, 0 deletions
diff --git a/.rat-excludes b/.rat-excludes index eaefef1b0a..fb6323daf9 100644 --- a/.rat-excludes +++ b/.rat-excludes @@ -31,6 +31,7 @@ sorttable.js .*data .*log cloudpickle.py +heapq3.py join.py SparkExprTyper.scala SparkILoop.scala |