SPARK-815. Python parallelize() should split lists before batching - spark

diff options

author	Matei Zaharia <matei@eecs.berkeley.edu>	2013-07-29 00:09:11 -0400
committer	Matei Zaharia <matei@eecs.berkeley.edu>	2013-07-29 02:51:43 -0400
commit	feba7ee540fca28872957120e5e39b9e36466953 (patch)
tree	c4349aa082e6727f638bc360ba6d9352a88959bc /docs
parent	d75c3086951f603ec30b2527c24559e053ed7f25 (diff)
download	spark-feba7ee540fca28872957120e5e39b9e36466953.tar.gz spark-feba7ee540fca28872957120e5e39b9e36466953.tar.bz2 spark-feba7ee540fca28872957120e5e39b9e36466953.zip

SPARK-815. Python parallelize() should split lists before batching

One unfortunate consequence of this fix is that we materialize any collections that are given to us as generators, but this seems necessary to get reasonable behavior on small collections. We could add a batchSize parameter later to bypass auto-computation of batch size if this becomes a problem (e.g. if users really want to parallelize big generators nicely)

Diffstat (limited to 'docs')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: