--- layout: global title: Launching Spark on YARN --- Spark allows you to launch jobs on an existing [YARN](http://hadoop.apache.org/docs/r2.0.1-alpha/hadoop-yarn/hadoop-yarn-site/YARN.html) cluster. # Preparations - In order to distribute Spark within the cluster it must be packaged into a single JAR file. This can be done by running `sbt/sbt assembly` - Your application code must be packaged into a separate jar file. If you want to test out the YARN deployment mode, you can use the current spark examples. A `spark-examples_2.9.1-0.6.0-SNAPSHOT.jar` file can be generated by running `sbt/sbt package`. # Launching Spark on YARN The command to launch the YARN Client is as follows: SPARK_JAR= ./run spark.deploy.yarn.Client \ --jar \ --class \ --args \ --num-workers \ --worker-memory \ --worker-cores For example: SPARK_JAR=./core/target/spark-core-assembly-0.6.0-SNAPSHOT.jar ./run spark.deploy.yarn.Client \ --jar examples/target/scala-2.9.1/spark-examples_2.9.1-0.6.0-SNAPSHOT.jar \ --class spark.examples.SparkPi \ --args standalone \ --num-workers 3 \ --worker-memory 2g \ --worker-cores 2 The above starts a YARN Client programs which periodically polls the Application Master for status updates and displays them in the console. The client will exit once your application has finished running. # Important Notes - When your application instantiates a Spark context it must use a special "standalone" master url. This starts the scheduler without forcing it to connect to a cluster. A good way to handle this is to pass "standalone" as an argument to your program, as shown in the example above. - YARN does not support requesting container resources based on the number of cores. Thus the numbers of cores given via command line arguments cannot be guaranteed.