diff options
author | Xiangrui Meng <meng@databricks.com> | 2014-11-14 12:43:17 -0800 |
---|---|---|
committer | Xiangrui Meng <meng@databricks.com> | 2014-11-14 12:43:17 -0800 |
commit | abd581752f9314791a688690c07ad1bb68cc09fe (patch) | |
tree | 190346541ee69688e77341c60ac863016c7887bf /conf | |
parent | 77e845ca7726ffee2d6f8e33ea56ec005dde3874 (diff) | |
download | spark-abd581752f9314791a688690c07ad1bb68cc09fe.tar.gz spark-abd581752f9314791a688690c07ad1bb68cc09fe.tar.bz2 spark-abd581752f9314791a688690c07ad1bb68cc09fe.zip |
[SPARK-4398][PySpark] specialize sc.parallelize(xrange)
`sc.parallelize(range(1 << 20), 1).count()` may take 15 seconds to finish and the rdd object stores the entire list, making task size very large. This PR adds a specialized version for xrange.
JoshRosen davies
Author: Xiangrui Meng <meng@databricks.com>
Closes #3264 from mengxr/SPARK-4398 and squashes the following commits:
8953c41 [Xiangrui Meng] follow davies' suggestion
cbd58e3 [Xiangrui Meng] specialize sc.parallelize(xrange)
Diffstat (limited to 'conf')
0 files changed, 0 insertions, 0 deletions