aboutsummaryrefslogtreecommitdiff
path: root/docs/job-scheduling.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/job-scheduling.md')
-rw-r--r--docs/job-scheduling.md26
1 files changed, 14 insertions, 12 deletions
diff --git a/docs/job-scheduling.md b/docs/job-scheduling.md
index d304c5497b..df2faa5e41 100644
--- a/docs/job-scheduling.md
+++ b/docs/job-scheduling.md
@@ -32,12 +32,11 @@ Resource allocation can be configured as follows, based on the cluster type:
* **Standalone mode:** By default, applications submitted to the standalone mode cluster will run in
FIFO (first-in-first-out) order, and each application will try to use all available nodes. You can limit
- the number of nodes an application uses by setting the `spark.cores.max` system property in it. This
- will allow multiple users/applications to run concurrently. For example, you might launch a long-running
- server that uses 10 cores, and allow users to launch shells that use 20 cores each.
+ the number of nodes an application uses by setting the `spark.cores.max` configuration property in it,
+ or change the default for applications that don't set this setting through `spark.deploy.defaultCores`.
Finally, in addition to controlling cores, each application's `spark.executor.memory` setting controls
its memory use.
-* **Mesos:** To use static partitioning on Mesos, set the `spark.mesos.coarse` system property to `true`,
+* **Mesos:** To use static partitioning on Mesos, set the `spark.mesos.coarse` configuration property to `true`,
and optionally set `spark.cores.max` to limit each application's resource share as in the standalone mode.
You should also set `spark.executor.memory` to control the executor memory.
* **YARN:** The `--num-workers` option to the Spark YARN client controls how many workers it will allocate
@@ -78,11 +77,13 @@ of cluster resources. This means that short jobs submitted while a long job is r
resources right away and still get good response times, without waiting for the long job to finish. This
mode is best for multi-user settings.
-To enable the fair scheduler, simply set the `spark.scheduler.mode` to `FAIR` before creating
+To enable the fair scheduler, simply set the `spark.scheduler.mode` property to `FAIR` when configuring
a SparkContext:
{% highlight scala %}
-System.setProperty("spark.scheduler.mode", "FAIR")
+val conf = new SparkConf().setMaster(...).setAppName(...)
+conf.set("spark.scheduler.mode", "FAIR")
+val sc = new SparkContext(conf)
{% endhighlight %}
## Fair Scheduler Pools
@@ -91,15 +92,15 @@ The fair scheduler also supports grouping jobs into _pools_, and setting differe
(e.g. weight) for each pool. This can be useful to create a "high-priority" pool for more important jobs,
for example, or to group the jobs of each user together and give _users_ equal shares regardless of how
many concurrent jobs they have instead of giving _jobs_ equal shares. This approach is modeled after the
-[Hadoop Fair Scheduler](http://hadoop.apache.org/docs/stable/fair_scheduler.html).
+[Hadoop Fair Scheduler](http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html).
Without any intervention, newly submitted jobs go into a _default pool_, but jobs' pools can be set by
adding the `spark.scheduler.pool` "local property" to the SparkContext in the thread that's submitting them.
This is done as follows:
{% highlight scala %}
-// Assuming context is your SparkContext variable
-context.setLocalProperty("spark.scheduler.pool", "pool1")
+// Assuming sc is your SparkContext variable
+sc.setLocalProperty("spark.scheduler.pool", "pool1")
{% endhighlight %}
After setting this local property, _all_ jobs submitted within this thread (by calls in this thread
@@ -108,7 +109,7 @@ it easy to have a thread run multiple jobs on behalf of the same user. If you'd
pool that a thread is associated with, simply call:
{% highlight scala %}
-context.setLocalProperty("spark.scheduler.pool", null)
+sc.setLocalProperty("spark.scheduler.pool", null)
{% endhighlight %}
## Default Behavior of Pools
@@ -138,10 +139,11 @@ properties:
of the cluster. By default, each pool's `minShare` is 0.
The pool properties can be set by creating an XML file, similar to `conf/fairscheduler.xml.template`,
-and setting the `spark.scheduler.allocation.file` property:
+and setting a `spark.scheduler.allocation.file` property in your
+[SparkConf](configuration.html#spark-properties).
{% highlight scala %}
-System.setProperty("spark.scheduler.allocation.file", "/path/to/file")
+conf.set("spark.scheduler.allocation.file", "/path/to/file")
{% endhighlight %}
The format of the XML file is simply a `<pool>` element for each pool, with different elements