diff options
Diffstat (limited to 'docs/running-on-yarn.md')
-rw-r--r-- | docs/running-on-yarn.md | 31 |
1 files changed, 31 insertions, 0 deletions
diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md index 4e92042da6..befd3eaee9 100644 --- a/docs/running-on-yarn.md +++ b/docs/running-on-yarn.md @@ -539,6 +539,37 @@ launch time. This is done by listing them in the `spark.yarn.access.namenodes` p spark.yarn.access.namenodes hdfs://ireland.example.org:8020/,hdfs://frankfurt.example.org:8020/ ``` +## Configuring the External Shuffle Service + +To start the Spark Shuffle Service on each `NodeManager` in your YARN cluster, follow these +instructions: + +1. Build Spark with the [YARN profile](building-spark.html). Skip this step if you are using a +pre-packaged distribution. +1. Locate the `spark-<version>-yarn-shuffle.jar`. This should be under +`$SPARK_HOME/common/network-yarn/target/scala-<version>` if you are building Spark yourself, and under +`lib` if you are using a distribution. +1. Add this jar to the classpath of all `NodeManager`s in your cluster. +1. In the `yarn-site.xml` on each node, add `spark_shuffle` to `yarn.nodemanager.aux-services`, +then set `yarn.nodemanager.aux-services.spark_shuffle.class` to +`org.apache.spark.network.yarn.YarnShuffleService`. +1. Restart all `NodeManager`s in your cluster. + +The following extra configuration options are available when the shuffle service is running on YARN: + +<table class="table"> +<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> +<tr> + <td><code>spark.yarn.shuffle.stopOnFailure</code></td> + <td><code>false</code></td> + <td> + Whether to stop the NodeManager when there's a failure in the Spark Shuffle Service's + initialization. This prevents application failures caused by running containers on + NodeManagers where the Spark Shuffle Service is not running. + </td> +</tr> +</table> + ## Launching your application with Apache Oozie Apache Oozie can launch Spark applications as part of a workflow. |