aboutsummaryrefslogtreecommitdiff
path: root/docs/tuning.md
diff options
context:
space:
mode:
authorStephen Haberman <stephen@exigencecorp.com>2013-02-10 02:27:03 -0600
committerStephen Haberman <stephen@exigencecorp.com>2013-02-10 02:27:03 -0600
commit680f42e6cd1ee8593136323a539dc5117b165377 (patch)
tree4e7b13abffb729ae9fd06a7c77ac46eff3809355 /docs/tuning.md
parentf750daa5103170d6c86cc321bf9e98bf067ea1bc (diff)
downloadspark-680f42e6cd1ee8593136323a539dc5117b165377.tar.gz
spark-680f42e6cd1ee8593136323a539dc5117b165377.tar.bz2
spark-680f42e6cd1ee8593136323a539dc5117b165377.zip
Change defaultPartitioner to use upstream split size.
Previously it used the SparkContext.defaultParallelism, which occassionally ended up being a very bad guess. Looking at upstream RDDs seems to make better use of the context. Also sorted the upstream RDDs by partition size first, as if we have a hugely-partitioned RDD and tiny-partitioned RDD, it is unlikely we want the resulting RDD to be tiny-partitioned.
Diffstat (limited to 'docs/tuning.md')
0 files changed, 0 insertions, 0 deletions