aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorMatei Zaharia <matei@eecs.berkeley.edu>2012-10-06 20:07:10 -0700
committerMatei Zaharia <matei@eecs.berkeley.edu>2012-10-06 20:07:10 -0700
commitdc28a3ac0a052f7327d03de76c3b153cda2b616a (patch)
tree953ac4550c3e49b6e75772da76186d714b2caeaa /docs
parent9a3b3f32a3ccb849293180a899377e8468f7544a (diff)
downloadspark-dc28a3ac0a052f7327d03de76c3b153cda2b616a.tar.gz
spark-dc28a3ac0a052f7327d03de76c3b153cda2b616a.tar.bz2
spark-dc28a3ac0a052f7327d03de76c3b153cda2b616a.zip
Modified shuffle to limit the maximum outstanding data size in bytes,
instead of the maximum number of outstanding fetches. This should make it faster when there are many small map output files, as well as more robust to overallocating memory on large map outputs.
Diffstat (limited to 'docs')
-rw-r--r--docs/configuration.md8
1 files changed, 5 insertions, 3 deletions
diff --git a/docs/configuration.md b/docs/configuration.md
index fa7123af1b..0987f7f7b1 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -139,10 +139,12 @@ Apart from these, the following properties are also available, and may be useful
</td>
</tr>
<tr>
- <td>spark.blockManager.parallelFetches</td>
- <td>4</td>
+ <td>spark.reducer.maxMbInFlight</td>
+ <td>48</td>
<td>
- Number of map output files to fetch concurrently from each reduce task.
+ Maximum size (in megabytes) of map outputs to fetch simultaneously from each reduce task. Since
+ each output requires us to create a buffer to receive it, this represents a fixed memory overhead
+ per reduce task, so keep it small unless you have a large amount of memory.
</td>
</tr>
<tr>