aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorMatei Zaharia <matei@databricks.com>2015-07-22 15:28:09 -0700
committerMatei Zaharia <matei@databricks.com>2015-07-22 15:28:09 -0700
commitfe26584a1f5b472fb2e87aa7259aec822a619a3b (patch)
treed568c3aeda422e91d2b3d1a9335605da55be73fa /docs
parent1aca9c13c144fa336af6afcfa666128bf77c49d4 (diff)
downloadspark-fe26584a1f5b472fb2e87aa7259aec822a619a3b.tar.gz
spark-fe26584a1f5b472fb2e87aa7259aec822a619a3b.tar.bz2
spark-fe26584a1f5b472fb2e87aa7259aec822a619a3b.zip
[SPARK-9244] Increase some memory defaults
There are a few memory limits that people hit often and that we could make higher, especially now that memory sizes have grown. - spark.akka.frameSize: This defaults at 10 but is often hit for map output statuses in large shuffles. This memory is not fully allocated up-front, so we can just make this larger and still not affect jobs that never sent a status that large. We increase it to 128. - spark.executor.memory: Defaults at 512m, which is really small. We increase it to 1g. Author: Matei Zaharia <matei@databricks.com> Closes #7586 from mateiz/configs and squashes the following commits: ce0038a [Matei Zaharia] [SPARK-9244] Increase some memory defaults
Diffstat (limited to 'docs')
-rw-r--r--docs/configuration.md16
1 files changed, 7 insertions, 9 deletions
diff --git a/docs/configuration.md b/docs/configuration.md
index 8a186ee51c..fea259204a 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -31,7 +31,6 @@ which can help detect bugs that only exist when we run in a distributed context.
val conf = new SparkConf()
.setMaster("local[2]")
.setAppName("CountingSheep")
- .set("spark.executor.memory", "1g")
val sc = new SparkContext(conf)
{% endhighlight %}
@@ -84,7 +83,7 @@ Running `./bin/spark-submit --help` will show the entire list of these options.
each line consists of a key and a value separated by whitespace. For example:
spark.master spark://5.6.7.8:7077
- spark.executor.memory 512m
+ spark.executor.memory 4g
spark.eventLog.enabled true
spark.serializer org.apache.spark.serializer.KryoSerializer
@@ -150,10 +149,9 @@ of the most common options to set are:
</tr>
<tr>
<td><code>spark.executor.memory</code></td>
- <td>512m</td>
+ <td>1g</td>
<td>
- Amount of memory to use per executor process, in the same format as JVM memory strings
- (e.g. <code>512m</code>, <code>2g</code>).
+ Amount of memory to use per executor process (e.g. <code>2g</code>, <code>8g</code>).
</td>
</tr>
<tr>
@@ -886,11 +884,11 @@ Apart from these, the following properties are also available, and may be useful
</tr>
<tr>
<td><code>spark.akka.frameSize</code></td>
- <td>10</td>
+ <td>128</td>
<td>
- Maximum message size to allow in "control plane" communication (for serialized tasks and task
- results), in MB. Increase this if your tasks need to send back large results to the driver
- (e.g. using <code>collect()</code> on a large dataset).
+ Maximum message size to allow in "control plane" communication; generally only applies to map
+ output size information sent between executors and the driver. Increase this if you are running
+ jobs with many thousands of map and reduce tasks and see messages about the frame size.
</td>
</tr>
<tr>