aboutsummaryrefslogtreecommitdiff
path: root/docs/running-on-yarn.md
diff options
context:
space:
mode:
authorMridul Muralidharan <mridul@gmail.com>2013-05-16 15:27:58 +0530
committerMridul Muralidharan <mridul@gmail.com>2013-05-16 15:27:58 +0530
commit87540a7b386837d177a6d356ad1f5ef2c1ad6ea5 (patch)
tree8e2f1dc98e094bcfc84cc25749e3b069a4f8b398 /docs/running-on-yarn.md
parent2f576aba8f6a7a101d2862808d03ec0da5ad00d4 (diff)
downloadspark-87540a7b386837d177a6d356ad1f5ef2c1ad6ea5.tar.gz
spark-87540a7b386837d177a6d356ad1f5ef2c1ad6ea5.tar.bz2
spark-87540a7b386837d177a6d356ad1f5ef2c1ad6ea5.zip
Fix running on yarn documentation
Diffstat (limited to 'docs/running-on-yarn.md')
-rw-r--r--docs/running-on-yarn.md26
1 files changed, 22 insertions, 4 deletions
diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md
index c8cf8ffc35..41c0b235dd 100644
--- a/docs/running-on-yarn.md
+++ b/docs/running-on-yarn.md
@@ -11,14 +11,32 @@ Ex: mvn -Phadoop2-yarn clean install
# Building spark core consolidated jar.
-Currently, only sbt can buid a consolidated jar which contains the entire spark code - which is required for launching jars on yarn.
-To do this via sbt - though (right now) is a manual process of enabling it in project/SparkBuild.scala.
+We need a consolidated spark core jar (which bundles all the required dependencies) to run Spark jobs on a yarn cluster.
+This can be built either through sbt or via maven.
+
+- Building spark assembled jar via sbt.
+It is a manual process of enabling it in project/SparkBuild.scala.
Please comment out the
HADOOP_VERSION, HADOOP_MAJOR_VERSION and HADOOP_YARN
variables before the line 'For Hadoop 2 YARN support'
Next, uncomment the subsequent 3 variable declaration lines (for these three variables) which enable hadoop yarn support.
-Currnetly, it is a TODO to add support for maven assembly.
+Assembly of the jar Ex:
+./sbt/sbt clean assembly
+
+The assembled jar would typically be something like :
+./streaming/target/spark-streaming-<VERSION>.jar
+
+
+- Building spark assembled jar via sbt.
+Use the hadoop2-yarn profile and execute the package target.
+
+Something like this. Ex:
+$ mvn -Phadoop2-yarn clean package -DskipTests=true
+
+
+This will build the shaded (consolidated) jar. Typically something like :
+./repl-bin/target/spark-repl-bin-<VERSION>-shaded-hadoop2-yarn.jar
# Preparations
@@ -62,6 +80,6 @@ The above starts a YARN Client programs which periodically polls the Application
# Important Notes
- When your application instantiates a Spark context it must use a special "standalone" master url. This starts the scheduler without forcing it to connect to a cluster. A good way to handle this is to pass "standalone" as an argument to your program, as shown in the example above.
-- YARN does not support requesting container resources based on the number of cores. Thus the numbers of cores given via command line arguments cannot be guaranteed.
+- We do not requesting container resources based on the number of cores. Thus the numbers of cores given via command line arguments cannot be guaranteed.
- Currently, we have not yet integrated with hadoop security. If --user is present, the hadoop_user specified will be used to run the tasks on the cluster. If unspecified, current user will be used (which should be valid in cluster).
Once hadoop security support is added, and if hadoop cluster is enabled with security, additional restrictions would apply via delegation tokens passed.