diff options
author | Matei Zaharia <matei@databricks.com> | 2013-12-30 22:17:28 -0500 |
---|---|---|
committer | Matei Zaharia <matei@databricks.com> | 2013-12-30 22:17:28 -0500 |
commit | 0fa5809768cf60ec62b4277f04e23a44dc1582e2 (patch) | |
tree | fee16620755769a70975c41d894db43633b18098 /docs/job-scheduling.md | |
parent | 994f080f8ae3372366e6004600ba791c8a372ff0 (diff) | |
download | spark-0fa5809768cf60ec62b4277f04e23a44dc1582e2.tar.gz spark-0fa5809768cf60ec62b4277f04e23a44dc1582e2.tar.bz2 spark-0fa5809768cf60ec62b4277f04e23a44dc1582e2.zip |
Updated docs for SparkConf and handled review comments
Diffstat (limited to 'docs/job-scheduling.md')
-rw-r--r-- | docs/job-scheduling.md | 21 |
1 files changed, 12 insertions, 9 deletions
diff --git a/docs/job-scheduling.md b/docs/job-scheduling.md index dbcb9ae343..5951155fe3 100644 --- a/docs/job-scheduling.md +++ b/docs/job-scheduling.md @@ -32,12 +32,12 @@ Resource allocation can be configured as follows, based on the cluster type: * **Standalone mode:** By default, applications submitted to the standalone mode cluster will run in FIFO (first-in-first-out) order, and each application will try to use all available nodes. You can limit - the number of nodes an application uses by setting the `spark.cores.max` system property in it. This + the number of nodes an application uses by setting the `spark.cores.max` configuration property in it. This will allow multiple users/applications to run concurrently. For example, you might launch a long-running server that uses 10 cores, and allow users to launch shells that use 20 cores each. Finally, in addition to controlling cores, each application's `spark.executor.memory` setting controls its memory use. -* **Mesos:** To use static partitioning on Mesos, set the `spark.mesos.coarse` system property to `true`, +* **Mesos:** To use static partitioning on Mesos, set the `spark.mesos.coarse` configuration property to `true`, and optionally set `spark.cores.max` to limit each application's resource share as in the standalone mode. You should also set `spark.executor.memory` to control the executor memory. * **YARN:** The `--num-workers` option to the Spark YARN client controls how many workers it will allocate @@ -78,11 +78,13 @@ of cluster resources. This means that short jobs submitted while a long job is r resources right away and still get good response times, without waiting for the long job to finish. This mode is best for multi-user settings. -To enable the fair scheduler, simply set the `spark.scheduler.mode` to `FAIR` before creating +To enable the fair scheduler, simply set the `spark.scheduler.mode` property to `FAIR` when configuring a SparkContext: {% highlight scala %} -System.setProperty("spark.scheduler.mode", "FAIR") +val conf = new SparkConf().setMaster(...).setAppName(...) +conf.set("spark.scheduler.mode", "FAIR") +val sc = new SparkContext(conf) {% endhighlight %} ## Fair Scheduler Pools @@ -98,8 +100,8 @@ adding the `spark.scheduler.pool` "local property" to the SparkContext in the th This is done as follows: {% highlight scala %} -// Assuming context is your SparkContext variable -context.setLocalProperty("spark.scheduler.pool", "pool1") +// Assuming sc is your SparkContext variable +sc.setLocalProperty("spark.scheduler.pool", "pool1") {% endhighlight %} After setting this local property, _all_ jobs submitted within this thread (by calls in this thread @@ -108,7 +110,7 @@ it easy to have a thread run multiple jobs on behalf of the same user. If you'd pool that a thread is associated with, simply call: {% highlight scala %} -context.setLocalProperty("spark.scheduler.pool", null) +sc.setLocalProperty("spark.scheduler.pool", null) {% endhighlight %} ## Default Behavior of Pools @@ -138,10 +140,11 @@ properties: of the cluster. By default, each pool's `minShare` is 0. The pool properties can be set by creating an XML file, similar to `conf/fairscheduler.xml.template`, -and setting the `spark.scheduler.allocation.file` property: +and setting a `spark.scheduler.allocation.file` property in your +[SparkConf](configuration.html#spark-properties). {% highlight scala %} -System.setProperty("spark.scheduler.allocation.file", "/path/to/file") +conf.set("spark.scheduler.allocation.file", "/path/to/file") {% endhighlight %} The format of the XML file is simply a `<pool>` element for each pool, with different elements |