aboutsummaryrefslogtreecommitdiff
path: root/docs/running-on-yarn.md
diff options
context:
space:
mode:
authorDenny <dennybritz@gmail.com>2012-09-13 09:47:54 -0700
committerDenny <dennybritz@gmail.com>2012-09-13 09:47:54 -0700
commit6d53b971b9ce593898fda7705a105400f5ab6a46 (patch)
tree916e76c5e76c2c366b6cd5cd8a7da49b4527d755 /docs/running-on-yarn.md
parentd3db46fdef6d1a0012d4ef0ce050614e2a4274fb (diff)
downloadspark-6d53b971b9ce593898fda7705a105400f5ab6a46.tar.gz
spark-6d53b971b9ce593898fda7705a105400f5ab6a46.tar.bz2
spark-6d53b971b9ce593898fda7705a105400f5ab6a46.zip
Added standalone and YARN docs. Merged standalone cluster into standalone doc
Diffstat (limited to 'docs/running-on-yarn.md')
-rw-r--r--docs/running-on-yarn.md42
1 files changed, 42 insertions, 0 deletions
diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md
new file mode 100644
index 0000000000..3c0e54671b
--- /dev/null
+++ b/docs/running-on-yarn.md
@@ -0,0 +1,42 @@
+---
+layout: global
+title: Launching Spark on YARN
+---
+
+Spark allows you to launch jobs on an existing [YARN](http://hadoop.apache.org/common/docs/r0.23.1/hadoop-yarn/hadoop-yarn-site/YARN.html) cluster.
+
+## Preparations
+
+- In order to distribute Spark within the cluster it must be packaged into a single JAR file. This can be done by running `sbt/sbt assembly`
+- Your application code must be packaged into a separate jar file.
+
+If you want to test out the YARN deployment mode, you can use the current spark examples. A `spark-examples_2.9.1-0.6.0-SNAPSHOT.jar` file can be generated by running `sbt/sbt package`.
+
+## Launching Spark on YARN
+
+The command to launch the YARN Client is as follows:
+
+ SPARK_JAR=<SPARK_YAR_FILE> ./run spark.deploy.yarn.Client
+ --jar <YOUR_APP_JAR_FILE>
+ --class <APP_MAIN_CLASS>
+ --args <APP_MAIN_ARGUMENTS>
+ --num-workers <NUMBER_OF_WORKER_MACHINES>
+ --worker-memory <MEMORY_PER_WORKER>
+ --worker-cores <CORES_PER_WORKER>
+
+For example:
+
+ SPARK_JAR=./core/target/spark-core-assembly-0.6.0-SNAPSHOT.jar ./run spark.deploy.yarn.Client
+ --jar examples/target/scala-2.9.1/spark-examples_2.9.1-0.6.0-SNAPSHOT.jar
+ --class spark.examples.SparkPi
+ --args standalone
+ --num-workers 3
+ --worker-memory 2g
+ --worker-cores 2
+
+The above starts a YARN Client programs which periodically polls the Application Master for status updates and displays them in the console. The client will exit once your application has finished running.
+
+## Important Notes
+
+- When your application instantiates a Spark context it must use a special "standalone" master url. This starts the scheduler without forcing it to connect to a cluster. A good way to handle this is to pass "standalone" as an argument to your program, as shown in the example above.
+- YARN does not support requesting container resources based on the number of cores. Thus the numbers of cores given via command line arguments cannot be guaranteed.