diff options
author | Davies Liu <davies@databricks.com> | 2015-06-18 13:45:58 -0700 |
---|---|---|
committer | Josh Rosen <joshrosen@databricks.com> | 2015-06-18 13:45:58 -0700 |
commit | 9b2002722273f98e193ad6cd54c9626292ab27d1 (patch) | |
tree | e78f4f6e47fbcbf7e062942407bcf2c380717b9c /python/pyspark/mllib/classification.py | |
parent | 31641128b34d6f2aa7cb67324c24dd8b3ed84689 (diff) | |
download | spark-9b2002722273f98e193ad6cd54c9626292ab27d1.tar.gz spark-9b2002722273f98e193ad6cd54c9626292ab27d1.tar.bz2 spark-9b2002722273f98e193ad6cd54c9626292ab27d1.zip |
[SPARK-8202] [PYSPARK] fix infinite loop during external sort in PySpark
The batch size during external sort will grow up to max 10000, then shrink down to zero, causing infinite loop.
Given the assumption that the items usually have similar size, so we don't need to adjust the batch size after first spill.
cc JoshRosen rxin angelini
Author: Davies Liu <davies@databricks.com>
Closes #6714 from davies/batch_size and squashes the following commits:
b170dfb [Davies Liu] update test
b9be832 [Davies Liu] Merge branch 'batch_size' of github.com:davies/spark into batch_size
6ade745 [Davies Liu] update test
5c21777 [Davies Liu] Update shuffle.py
e746aec [Davies Liu] fix batch size during sort
Diffstat (limited to 'python/pyspark/mllib/classification.py')
0 files changed, 0 insertions, 0 deletions