diff options
Diffstat (limited to 'docs/job-scheduling.md')
-rw-r--r-- | docs/job-scheduling.md | 26 |
1 files changed, 14 insertions, 12 deletions
diff --git a/docs/job-scheduling.md b/docs/job-scheduling.md index d304c5497b..df2faa5e41 100644 --- a/docs/job-scheduling.md +++ b/docs/job-scheduling.md @@ -32,12 +32,11 @@ Resource allocation can be configured as follows, based on the cluster type: * **Standalone mode:** By default, applications submitted to the standalone mode cluster will run in FIFO (first-in-first-out) order, and each application will try to use all available nodes. You can limit - the number of nodes an application uses by setting the `spark.cores.max` system property in it. This - will allow multiple users/applications to run concurrently. For example, you might launch a long-running - server that uses 10 cores, and allow users to launch shells that use 20 cores each. + the number of nodes an application uses by setting the `spark.cores.max` configuration property in it, + or change the default for applications that don't set this setting through `spark.deploy.defaultCores`. Finally, in addition to controlling cores, each application's `spark.executor.memory` setting controls its memory use. -* **Mesos:** To use static partitioning on Mesos, set the `spark.mesos.coarse` system property to `true`, +* **Mesos:** To use static partitioning on Mesos, set the `spark.mesos.coarse` configuration property to `true`, and optionally set `spark.cores.max` to limit each application's resource share as in the standalone mode. You should also set `spark.executor.memory` to control the executor memory. * **YARN:** The `--num-workers` option to the Spark YARN client controls how many workers it will allocate @@ -78,11 +77,13 @@ of cluster resources. This means that short jobs submitted while a long job is r resources right away and still get good response times, without waiting for the long job to finish. This mode is best for multi-user settings. -To enable the fair scheduler, simply set the `spark.scheduler.mode` to `FAIR` before creating +To enable the fair scheduler, simply set the `spark.scheduler.mode` property to `FAIR` when configuring a SparkContext: {% highlight scala %} -System.setProperty("spark.scheduler.mode", "FAIR") +val conf = new SparkConf().setMaster(...).setAppName(...) +conf.set("spark.scheduler.mode", "FAIR") +val sc = new SparkContext(conf) {% endhighlight %} ## Fair Scheduler Pools @@ -91,15 +92,15 @@ The fair scheduler also supports grouping jobs into _pools_, and setting differe (e.g. weight) for each pool. This can be useful to create a "high-priority" pool for more important jobs, for example, or to group the jobs of each user together and give _users_ equal shares regardless of how many concurrent jobs they have instead of giving _jobs_ equal shares. This approach is modeled after the -[Hadoop Fair Scheduler](http://hadoop.apache.org/docs/stable/fair_scheduler.html). +[Hadoop Fair Scheduler](http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html). Without any intervention, newly submitted jobs go into a _default pool_, but jobs' pools can be set by adding the `spark.scheduler.pool` "local property" to the SparkContext in the thread that's submitting them. This is done as follows: {% highlight scala %} -// Assuming context is your SparkContext variable -context.setLocalProperty("spark.scheduler.pool", "pool1") +// Assuming sc is your SparkContext variable +sc.setLocalProperty("spark.scheduler.pool", "pool1") {% endhighlight %} After setting this local property, _all_ jobs submitted within this thread (by calls in this thread @@ -108,7 +109,7 @@ it easy to have a thread run multiple jobs on behalf of the same user. If you'd pool that a thread is associated with, simply call: {% highlight scala %} -context.setLocalProperty("spark.scheduler.pool", null) +sc.setLocalProperty("spark.scheduler.pool", null) {% endhighlight %} ## Default Behavior of Pools @@ -138,10 +139,11 @@ properties: of the cluster. By default, each pool's `minShare` is 0. The pool properties can be set by creating an XML file, similar to `conf/fairscheduler.xml.template`, -and setting the `spark.scheduler.allocation.file` property: +and setting a `spark.scheduler.allocation.file` property in your +[SparkConf](configuration.html#spark-properties). {% highlight scala %} -System.setProperty("spark.scheduler.allocation.file", "/path/to/file") +conf.set("spark.scheduler.allocation.file", "/path/to/file") {% endhighlight %} The format of the XML file is simply a `<pool>` element for each pool, with different elements |