diff options
author | Stephen Haberman <stephen@exigencecorp.com> | 2013-02-10 02:27:03 -0600 |
---|---|---|
committer | Stephen Haberman <stephen@exigencecorp.com> | 2013-02-10 02:27:03 -0600 |
commit | 680f42e6cd1ee8593136323a539dc5117b165377 (patch) | |
tree | 4e7b13abffb729ae9fd06a7c77ac46eff3809355 /docs | |
parent | f750daa5103170d6c86cc321bf9e98bf067ea1bc (diff) | |
download | spark-680f42e6cd1ee8593136323a539dc5117b165377.tar.gz spark-680f42e6cd1ee8593136323a539dc5117b165377.tar.bz2 spark-680f42e6cd1ee8593136323a539dc5117b165377.zip |
Change defaultPartitioner to use upstream split size.
Previously it used the SparkContext.defaultParallelism, which occassionally
ended up being a very bad guess. Looking at upstream RDDs seems to make
better use of the context.
Also sorted the upstream RDDs by partition size first, as if we have
a hugely-partitioned RDD and tiny-partitioned RDD, it is unlikely
we want the resulting RDD to be tiny-partitioned.
Diffstat (limited to 'docs')
0 files changed, 0 insertions, 0 deletions