aboutsummaryrefslogtreecommitdiff
path: root/docs/running-on-yarn.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/running-on-yarn.md')
-rw-r--r--docs/running-on-yarn.md39
1 files changed, 11 insertions, 28 deletions
diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md
index 678cd57aba..fe5334ffdc 100644
--- a/docs/running-on-yarn.md
+++ b/docs/running-on-yarn.md
@@ -3,50 +3,33 @@ layout: global
title: Launching Spark on YARN
---
-Experimental support for running over a [YARN (Hadoop
+Support for running on [YARN (Hadoop
NextGen)](http://hadoop.apache.org/docs/r2.0.2-alpha/hadoop-yarn/hadoop-yarn-site/YARN.html)
-cluster was added to Spark in version 0.6.0. This was merged into master as part of 0.7 effort.
-To build spark with YARN support, please use the hadoop2-yarn profile.
-Ex: mvn -Phadoop2-yarn clean install
+was added to Spark in version 0.6.0, and improved in 0.7.0 and 0.8.0.
-# Building spark core consolidated jar.
+# Building a YARN-Enabled Assembly JAR
-We need a consolidated spark core jar (which bundles all the required dependencies) to run Spark jobs on a yarn cluster.
-This can be built either through sbt or via maven.
+We need a consolidated Spark JAR (which bundles all the required dependencies) to run Spark jobs on a YARN cluster.
+This can be built by setting the Hadoop version and `SPARK_YARN` environment variable, as follows:
-- Building spark assembled jar via sbt.
-Enable YARN support by setting `SPARK_YARN=true` when invoking sbt:
+ SPARK_HADOOP_VERSION=2.0.5-alpha SPARK_YARN=true ./sbt/sbt assembly
- SPARK_HADOOP_VERSION=2.0.5-alpha SPARK_YARN=true ./sbt/sbt clean assembly
-
-The assembled jar would typically be something like :
-`./yarn/target/spark-yarn-assembly-0.8.0-SNAPSHOT.jar`
-
-
-- Building spark assembled jar via Maven.
- Use the hadoop2-yarn profile and execute the package target.
-
-Something like this. Ex:
-
- mvn -Phadoop2-yarn -Dhadoop.version=2.0.5-alpha clean package -DskipTests=true
-
-
-This will build the shaded (consolidated) jar. Typically something like :
-`./yarn/target/spark-yarn-bin-<VERSION>-shaded.jar`
+The assembled JAR will be something like this:
+`./assembly/target/scala-{{site.SCALA_VERSION}}/spark-assembly_{{site.SPARK_VERSION}}-hadoop2.0.5.jar`.
# Preparations
-- Building spark-yarn assembly (see above).
+- Building a YARN-enabled assembly (see above).
- Your application code must be packaged into a separate JAR file.
-If you want to test out the YARN deployment mode, you can use the current Spark examples. A `spark-examples_{{site.SCALA_VERSION}}-{{site.SPARK_VERSION}}` file can be generated by running `sbt/sbt package`. NOTE: since the documentation you're reading is for Spark version {{site.SPARK_VERSION}}, we are assuming here that you have downloaded Spark {{site.SPARK_VERSION}} or checked it out of source control. If you are using a different version of Spark, the version numbers in the jar generated by the sbt package command will obviously be different.
+If you want to test out the YARN deployment mode, you can use the current Spark examples. A `spark-examples_{{site.SCALA_VERSION}}-{{site.SPARK_VERSION}}` file can be generated by running `sbt/sbt assembly`. NOTE: since the documentation you're reading is for Spark version {{site.SPARK_VERSION}}, we are assuming here that you have downloaded Spark {{site.SPARK_VERSION}} or checked it out of source control. If you are using a different version of Spark, the version numbers in the jar generated by the sbt package command will obviously be different.
# Configuration
Most of the configs are the same for Spark on YARN as other deploys. See the Configuration page for more information on those. These are configs that are specific to SPARK on YARN.
-* `SPARK_YARN_USER_ENV`, to add environment variables to the Spark processes launched on YARN. This can be a comma separated list of environment variables. ie SPARK_YARN_USER_ENV="JAVA_HOME=/jdk64,FOO=bar"
+* `SPARK_YARN_USER_ENV`, to add environment variables to the Spark processes launched on YARN. This can be a comma separated list of environment variables, e.g. `SPARK_YARN_USER_ENV="JAVA_HOME=/jdk64,FOO=bar"`.
# Launching Spark on YARN