Merge pull request #459 from stephenh/bettersplits

Change defaultPartitioner to use upstream split size.
author: Matei Zaharia <matei@eecs.berkeley.edu> 2013-02-25 09:22:04 -0800
committer: Matei Zaharia <matei@eecs.berkeley.edu> 2013-02-25 09:22:04 -0800
commit: d6e6abece306008c50410807669596d73d6d6738 (patch)
tree: e16823cba6d0de277d58712a236ec9ecf816556a /docs
parent: fb7625059837b124da1e31bd126f5278eef68bf9 (diff)
parent: c44ccf2862e8be183ccecac3bf61f9651b21984a (diff)
download: spark-d6e6abece306008c50410807669596d73d6d6738.tar.gz
spark-d6e6abece306008c50410807669596d73d6d6738.tar.bz2
spark-d6e6abece306008c50410807669596d73d6d6738.zip
1 files changed, 4 insertions, 4 deletions
diff --git a/docs/tuning.md b/docs/tuning.md
index 738c530458..843380b9a2 100644
--- a/docs/tuning.md
+++ b/docs/tuning.md
@@ -213,10 +213,10 @@ but at a high level, managing how frequently full GC takes place can help in red
 
 Clusters will not be fully utilized unless you set the level of parallelism for each operation high
 enough. Spark automatically sets the number of "map" tasks to run on each file according to its size
-(though you can control it through optional parameters to `SparkContext.textFile`, etc), but for
-distributed "reduce" operations, such as `groupByKey` and `reduceByKey`, it uses a default value of 8.
-You can pass the level of parallelism as a second argument (see the
-[`spark.PairRDDFunctions`](api/core/index.html#spark.PairRDDFunctions) documentation),
+(though you can control it through optional parameters to `SparkContext.textFile`, etc), and for
+distributed "reduce" operations, such as `groupByKey` and `reduceByKey`, it uses the largest
+parent RDD's number of partitions. You can pass the level of parallelism as a second argument
+(see the [`spark.PairRDDFunctions`](api/core/index.html#spark.PairRDDFunctions) documentation),
 or set the system property `spark.default.parallelism` to change the default.
 In general, we recommend 2-3 tasks per CPU core in your cluster.
author	Matei Zaharia <matei@eecs.berkeley.edu>	2013-02-25 09:22:04 -0800
committer	Matei Zaharia <matei@eecs.berkeley.edu>	2013-02-25 09:22:04 -0800
commit	d6e6abece306008c50410807669596d73d6d6738 (patch)
tree	e16823cba6d0de277d58712a236ec9ecf816556a /docs
parent	fb7625059837b124da1e31bd126f5278eef68bf9 (diff)
parent	c44ccf2862e8be183ccecac3bf61f9651b21984a (diff)
download	spark-d6e6abece306008c50410807669596d73d6d6738.tar.gz spark-d6e6abece306008c50410807669596d73d6d6738.tar.bz2 spark-d6e6abece306008c50410807669596d73d6d6738.zip