[SPARK-9244] Increase some memory defaults

There are a few memory limits that people hit often and that we could make higher, especially now that memory sizes have grown. - spark.akka.frameSize: This defaults at 10 but is often hit for map output statuses in large shuffles. This memory is not fully allocated up-front, so we can just make this larger and still not affect jobs that never sent a status that large. We increase it to 128. - spark.executor.memory: Defaults at 512m, which is really small. We increase it to 1g. Author: Matei Zaharia <matei@databricks.com> Closes #7586 from mateiz/configs and squashes the following commits: ce0038a [Matei Zaharia] [SPARK-9244] Increase some memory defaults
author: Matei Zaharia <matei@databricks.com> 2015-07-22 15:28:09 -0700
committer: Matei Zaharia <matei@databricks.com> 2015-07-22 15:28:09 -0700
commit: fe26584a1f5b472fb2e87aa7259aec822a619a3b (patch)
tree: d568c3aeda422e91d2b3d1a9335605da55be73fa /docs
parent: 1aca9c13c144fa336af6afcfa666128bf77c49d4 (diff)
download: spark-fe26584a1f5b472fb2e87aa7259aec822a619a3b.tar.gz
spark-fe26584a1f5b472fb2e87aa7259aec822a619a3b.tar.bz2
spark-fe26584a1f5b472fb2e87aa7259aec822a619a3b.zip
1 files changed, 7 insertions, 9 deletions
diff --git a/docs/configuration.md b/docs/configuration.md
index 8a186ee51c..fea259204a 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -31,7 +31,6 @@ which can help detect bugs that only exist when we run in a distributed context.
 val conf = new SparkConf()
              .setMaster("local[2]")
              .setAppName("CountingSheep")
-             .set("spark.executor.memory", "1g")
 val sc = new SparkContext(conf)
 {% endhighlight %}
 
@@ -84,7 +83,7 @@ Running `./bin/spark-submit --help` will show the entire list of these options.
 each line consists of a key and a value separated by whitespace. For example:
 
     spark.master            spark://5.6.7.8:7077
-    spark.executor.memory   512m
+    spark.executor.memory   4g
     spark.eventLog.enabled  true
     spark.serializer        org.apache.spark.serializer.KryoSerializer
 
@@ -150,10 +149,9 @@ of the most common options to set are:
 </tr>
 <tr>
   <td><code>spark.executor.memory</code></td>
-  <td>512m</td>
+  <td>1g</td>
   <td>
-    Amount of memory to use per executor process, in the same format as JVM memory strings
-    (e.g. <code>512m</code>, <code>2g</code>).
+    Amount of memory to use per executor process (e.g. <code>2g</code>, <code>8g</code>).
   </td>
 </tr>
 <tr>
@@ -886,11 +884,11 @@ Apart from these, the following properties are also available, and may be useful
 </tr>
 <tr>
   <td><code>spark.akka.frameSize</code></td>
-  <td>10</td>
+  <td>128</td>
   <td>
-    Maximum message size to allow in "control plane" communication (for serialized tasks and task
-    results), in MB. Increase this if your tasks need to send back large results to the driver
-    (e.g. using <code>collect()</code> on a large dataset).
+    Maximum message size to allow in "control plane" communication; generally only applies to map
+    output size information sent between executors and the driver. Increase this if you are running
+    jobs with many thousands of map and reduce tasks and see messages about the frame size.
   </td>
 </tr>
 <tr>
author	Matei Zaharia <matei@databricks.com>	2015-07-22 15:28:09 -0700
committer	Matei Zaharia <matei@databricks.com>	2015-07-22 15:28:09 -0700
commit	fe26584a1f5b472fb2e87aa7259aec822a619a3b (patch)
tree	d568c3aeda422e91d2b3d1a9335605da55be73fa /docs
parent	1aca9c13c144fa336af6afcfa666128bf77c49d4 (diff)
download	spark-fe26584a1f5b472fb2e87aa7259aec822a619a3b.tar.gz spark-fe26584a1f5b472fb2e87aa7259aec822a619a3b.tar.bz2 spark-fe26584a1f5b472fb2e87aa7259aec822a619a3b.zip