diff options
author | Matei Zaharia <matei@databricks.com> | 2015-07-22 15:28:09 -0700 |
---|---|---|
committer | Matei Zaharia <matei@databricks.com> | 2015-07-22 15:28:09 -0700 |
commit | fe26584a1f5b472fb2e87aa7259aec822a619a3b (patch) | |
tree | d568c3aeda422e91d2b3d1a9335605da55be73fa /docs | |
parent | 1aca9c13c144fa336af6afcfa666128bf77c49d4 (diff) | |
download | spark-fe26584a1f5b472fb2e87aa7259aec822a619a3b.tar.gz spark-fe26584a1f5b472fb2e87aa7259aec822a619a3b.tar.bz2 spark-fe26584a1f5b472fb2e87aa7259aec822a619a3b.zip |
[SPARK-9244] Increase some memory defaults
There are a few memory limits that people hit often and that we could
make higher, especially now that memory sizes have grown.
- spark.akka.frameSize: This defaults at 10 but is often hit for map
output statuses in large shuffles. This memory is not fully allocated
up-front, so we can just make this larger and still not affect jobs
that never sent a status that large. We increase it to 128.
- spark.executor.memory: Defaults at 512m, which is really small. We
increase it to 1g.
Author: Matei Zaharia <matei@databricks.com>
Closes #7586 from mateiz/configs and squashes the following commits:
ce0038a [Matei Zaharia] [SPARK-9244] Increase some memory defaults
Diffstat (limited to 'docs')
-rw-r--r-- | docs/configuration.md | 16 |
1 files changed, 7 insertions, 9 deletions
diff --git a/docs/configuration.md b/docs/configuration.md index 8a186ee51c..fea259204a 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -31,7 +31,6 @@ which can help detect bugs that only exist when we run in a distributed context. val conf = new SparkConf() .setMaster("local[2]") .setAppName("CountingSheep") - .set("spark.executor.memory", "1g") val sc = new SparkContext(conf) {% endhighlight %} @@ -84,7 +83,7 @@ Running `./bin/spark-submit --help` will show the entire list of these options. each line consists of a key and a value separated by whitespace. For example: spark.master spark://5.6.7.8:7077 - spark.executor.memory 512m + spark.executor.memory 4g spark.eventLog.enabled true spark.serializer org.apache.spark.serializer.KryoSerializer @@ -150,10 +149,9 @@ of the most common options to set are: </tr> <tr> <td><code>spark.executor.memory</code></td> - <td>512m</td> + <td>1g</td> <td> - Amount of memory to use per executor process, in the same format as JVM memory strings - (e.g. <code>512m</code>, <code>2g</code>). + Amount of memory to use per executor process (e.g. <code>2g</code>, <code>8g</code>). </td> </tr> <tr> @@ -886,11 +884,11 @@ Apart from these, the following properties are also available, and may be useful </tr> <tr> <td><code>spark.akka.frameSize</code></td> - <td>10</td> + <td>128</td> <td> - Maximum message size to allow in "control plane" communication (for serialized tasks and task - results), in MB. Increase this if your tasks need to send back large results to the driver - (e.g. using <code>collect()</code> on a large dataset). + Maximum message size to allow in "control plane" communication; generally only applies to map + output size information sent between executors and the driver. Increase this if you are running + jobs with many thousands of map and reduce tasks and see messages about the frame size. </td> </tr> <tr> |