diff options
author | Matei Zaharia <matei@eecs.berkeley.edu> | 2013-07-29 00:09:11 -0400 |
---|---|---|
committer | Matei Zaharia <matei@eecs.berkeley.edu> | 2013-07-29 02:51:43 -0400 |
commit | feba7ee540fca28872957120e5e39b9e36466953 (patch) | |
tree | c4349aa082e6727f638bc360ba6d9352a88959bc /docs | |
parent | d75c3086951f603ec30b2527c24559e053ed7f25 (diff) | |
download | spark-feba7ee540fca28872957120e5e39b9e36466953.tar.gz spark-feba7ee540fca28872957120e5e39b9e36466953.tar.bz2 spark-feba7ee540fca28872957120e5e39b9e36466953.zip |
SPARK-815. Python parallelize() should split lists before batching
One unfortunate consequence of this fix is that we materialize any
collections that are given to us as generators, but this seems necessary
to get reasonable behavior on small collections. We could add a
batchSize parameter later to bypass auto-computation of batch size if
this becomes a problem (e.g. if users really want to parallelize big
generators nicely)
Diffstat (limited to 'docs')
0 files changed, 0 insertions, 0 deletions