aboutsummaryrefslogtreecommitdiff
path: root/docs/configuration.md
diff options
context:
space:
mode:
authorMatei Zaharia <matei@databricks.com>2014-01-07 14:35:52 -0500
committerMatei Zaharia <matei@databricks.com>2014-01-07 14:35:52 -0500
commitd8bcc8e9a095c1b20dd7a17b6535800d39bff80e (patch)
treef3f5a1368a43b765b541be706921903cc6ac8da0 /docs/configuration.md
parent15d953450167c4ec45c9d0a2c7ab8ee71be2e576 (diff)
downloadspark-d8bcc8e9a095c1b20dd7a17b6535800d39bff80e.tar.gz
spark-d8bcc8e9a095c1b20dd7a17b6535800d39bff80e.tar.bz2
spark-d8bcc8e9a095c1b20dd7a17b6535800d39bff80e.zip
Add way to limit default # of cores used by applications on standalone mode
Also documents the spark.deploy.spreadOut option.
Diffstat (limited to 'docs/configuration.md')
-rw-r--r--docs/configuration.md33
1 files changed, 29 insertions, 4 deletions
diff --git a/docs/configuration.md b/docs/configuration.md
index 1d36ecb9c1..52ed59be30 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -77,13 +77,14 @@ there are at least five properties that you will commonly want to control:
</tr>
<tr>
<td>spark.cores.max</td>
- <td>(infinite)</td>
+ <td>(not set)</td>
<td>
When running on a <a href="spark-standalone.html">standalone deploy cluster</a> or a
<a href="running-on-mesos.html#mesos-run-modes">Mesos cluster in "coarse-grained"
sharing mode</a>, the maximum amount of CPU cores to request for the application from
- across the cluster (not from each machine). The default will use all available cores
- offered by the cluster manager.
+ across the cluster (not from each machine). If not set, the default will be
+ <code>spark.deploy.defaultCores</code> on Spark's standalone cluster manager, or
+ infinite (all available cores) on Mesos.
</td>
</tr>
</table>
@@ -404,12 +405,36 @@ Apart from these, the following properties are also available, and may be useful
</td>
</tr>
<tr>
- <td>spark.log-conf</td>
+ <td>spark.logConf</td>
<td>false</td>
<td>
Log the supplied SparkConf as INFO at start of spark context.
</td>
</tr>
+<tr>
+ <td>spark.deploy.spreadOut</td>
+ <td>true</td>
+ <td>
+ Whether the standalone cluster manager should spread applications out across nodes or try
+ to consolidate them onto as few nodes as possible. Spreading out is usually better for
+ data locality in HDFS, but consolidating is more efficient for compute-intensive workloads. <br/>
+ <b>Note:</b> this setting needs to be configured in the cluster master, not in individual
+ applications; you can set it through <code>SPARK_JAVA_OPTS</code> in <code>spark-env.sh</code>.
+ </td>
+</tr>
+<tr>
+ <td>spark.deploy.defaultCores</td>
+ <td>(infinite)</td>
+ <td>
+ Default number of cores to give to applications in Spark's standalone mode if they don't
+ set <code>spark.cores.max</code>. If not set, applications always get all available
+ cores unless they configure <code>spark.cores.max</code> themselves.
+ Set this lower on a shared cluster to prevent users from grabbing
+ the whole cluster by default. <br/>
+ <b>Note:</b> this setting needs to be configured in the cluster master, not in individual
+ applications; you can set it through <code>SPARK_JAVA_OPTS</code> in <code>spark-env.sh</code>.
+ </td>
+</tr>
</table>
## Viewing Spark Properties