diff options
author | Matei Zaharia <matei@eecs.berkeley.edu> | 2012-10-06 20:07:10 -0700 |
---|---|---|
committer | Matei Zaharia <matei@eecs.berkeley.edu> | 2012-10-06 20:07:10 -0700 |
commit | dc28a3ac0a052f7327d03de76c3b153cda2b616a (patch) | |
tree | 953ac4550c3e49b6e75772da76186d714b2caeaa /docs/configuration.md | |
parent | 9a3b3f32a3ccb849293180a899377e8468f7544a (diff) | |
download | spark-dc28a3ac0a052f7327d03de76c3b153cda2b616a.tar.gz spark-dc28a3ac0a052f7327d03de76c3b153cda2b616a.tar.bz2 spark-dc28a3ac0a052f7327d03de76c3b153cda2b616a.zip |
Modified shuffle to limit the maximum outstanding data size in bytes,
instead of the maximum number of outstanding fetches. This should make
it faster when there are many small map output files, as well as more
robust to overallocating memory on large map outputs.
Diffstat (limited to 'docs/configuration.md')
-rw-r--r-- | docs/configuration.md | 8 |
1 files changed, 5 insertions, 3 deletions
diff --git a/docs/configuration.md b/docs/configuration.md index fa7123af1b..0987f7f7b1 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -139,10 +139,12 @@ Apart from these, the following properties are also available, and may be useful </td> </tr> <tr> - <td>spark.blockManager.parallelFetches</td> - <td>4</td> + <td>spark.reducer.maxMbInFlight</td> + <td>48</td> <td> - Number of map output files to fetch concurrently from each reduce task. + Maximum size (in megabytes) of map outputs to fetch simultaneously from each reduce task. Since + each output requires us to create a buffer to receive it, this represents a fixed memory overhead + per reduce task, so keep it small unless you have a large amount of memory. </td> </tr> <tr> |