[SPARK-4398][PySpark] specialize sc.parallelize(xrange) - spark

diff options

author	Xiangrui Meng <meng@databricks.com>	2014-11-14 12:43:17 -0800
committer	Xiangrui Meng <meng@databricks.com>	2014-11-14 12:43:25 -0800
commit	3014803ead0aac31f36f4387c919174877525ff4 (patch)
tree	0c6fd3005a0cc7922da595ff28eb545688a6b17c /python/pyspark/tests.py
parent	3219271f403091d4d3af4cddd08121ba538a459b (diff)
download	spark-3014803ead0aac31f36f4387c919174877525ff4.tar.gz spark-3014803ead0aac31f36f4387c919174877525ff4.tar.bz2 spark-3014803ead0aac31f36f4387c919174877525ff4.zip

[SPARK-4398][PySpark] specialize sc.parallelize(xrange)

`sc.parallelize(range(1 << 20), 1).count()` may take 15 seconds to finish and the rdd object stores the entire list, making task size very large. This PR adds a specialized version for xrange. JoshRosen davies Author: Xiangrui Meng <meng@databricks.com> Closes #3264 from mengxr/SPARK-4398 and squashes the following commits: 8953c41 [Xiangrui Meng] follow davies' suggestion cbd58e3 [Xiangrui Meng] specialize sc.parallelize(xrange) (cherry picked from commit abd581752f9314791a688690c07ad1bb68cc09fe) Signed-off-by: Xiangrui Meng <meng@databricks.com>

Diffstat (limited to 'python/pyspark/tests.py')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: