Add way to limit default # of cores used by applications on standalone mode

Also documents the spark.deploy.spreadOut option.
author: Matei Zaharia <matei@databricks.com> 2014-01-07 14:35:52 -0500
committer: Matei Zaharia <matei@databricks.com> 2014-01-07 14:35:52 -0500
commit: d8bcc8e9a095c1b20dd7a17b6535800d39bff80e (patch)
tree: f3f5a1368a43b765b541be706921903cc6ac8da0 /docs/configuration.md
parent: 15d953450167c4ec45c9d0a2c7ab8ee71be2e576 (diff)
download: spark-d8bcc8e9a095c1b20dd7a17b6535800d39bff80e.tar.gz
spark-d8bcc8e9a095c1b20dd7a17b6535800d39bff80e.tar.bz2
spark-d8bcc8e9a095c1b20dd7a17b6535800d39bff80e.zip
1 files changed, 29 insertions, 4 deletions
diff --git a/docs/configuration.md b/docs/configuration.md
index 1d36ecb9c1..52ed59be30 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -77,13 +77,14 @@ there are at least five properties that you will commonly want to control:
 </tr>
 <tr>
   <td>spark.cores.max</td>
-  <td>(infinite)</td>
+  <td>(not set)</td>
   <td>
     When running on a <a href="spark-standalone.html">standalone deploy cluster</a> or a
     <a href="running-on-mesos.html#mesos-run-modes">Mesos cluster in "coarse-grained"
     sharing mode</a>, the maximum amount of CPU cores to request for the application from
-    across the cluster (not from each machine). The default will use all available cores
-    offered by the cluster manager.
+    across the cluster (not from each machine). If not set, the default will be
+    <code>spark.deploy.defaultCores</code> on Spark's standalone cluster manager, or
+    infinite (all available cores) on Mesos.
   </td>
 </tr>
 </table>
@@ -404,12 +405,36 @@ Apart from these, the following properties are also available, and may be useful
   </td>
 </tr>
 <tr>
-  <td>spark.log-conf</td>
+  <td>spark.logConf</td>
   <td>false</td>
   <td>
     Log the supplied SparkConf as INFO at start of spark context.
   </td>
 </tr>
+<tr>
+  <td>spark.deploy.spreadOut</td>
+  <td>true</td>
+  <td>
+    Whether the standalone cluster manager should spread applications out across nodes or try
+    to consolidate them onto as few nodes as possible. Spreading out is usually better for
+    data locality in HDFS, but consolidating is more efficient for compute-intensive workloads. <br/>
+    <b>Note:</b> this setting needs to be configured in the cluster master, not in individual
+    applications; you can set it through <code>SPARK_JAVA_OPTS</code> in <code>spark-env.sh</code>.
+  </td>
+</tr>
+<tr>
+  <td>spark.deploy.defaultCores</td>
+  <td>(infinite)</td>
+  <td>
+    Default number of cores to give to applications in Spark's standalone mode if they don't
+    set <code>spark.cores.max</code>. If not set, applications always get all available
+    cores unless they configure <code>spark.cores.max</code> themselves.
+    Set this lower on a shared cluster to prevent users from grabbing
+    the whole cluster by default. <br/>
+    <b>Note:</b> this setting needs to be configured in the cluster master, not in individual
+    applications; you can set it through <code>SPARK_JAVA_OPTS</code> in <code>spark-env.sh</code>.
+  </td>
+</tr>
 </table>
 
 ## Viewing Spark Properties
author	Matei Zaharia <matei@databricks.com>	2014-01-07 14:35:52 -0500
committer	Matei Zaharia <matei@databricks.com>	2014-01-07 14:35:52 -0500
commit	d8bcc8e9a095c1b20dd7a17b6535800d39bff80e (patch)
tree	f3f5a1368a43b765b541be706921903cc6ac8da0 /docs/configuration.md
parent	15d953450167c4ec45c9d0a2c7ab8ee71be2e576 (diff)
download	spark-d8bcc8e9a095c1b20dd7a17b6535800d39bff80e.tar.gz spark-d8bcc8e9a095c1b20dd7a17b6535800d39bff80e.tar.bz2 spark-d8bcc8e9a095c1b20dd7a17b6535800d39bff80e.zip