Updated documentation about the YARN v2.2 build process

author: Ali Ghodsi <alig@cs.berkeley.edu> 2013-12-06 00:43:12 -0800
committer: Ali Ghodsi <alig@cs.berkeley.edu> 2013-12-06 16:31:26 -0800
commit: f2fb4b422863059476816df07ca7ea18f62e3a9d (patch)
tree: 670809f99ec2e614962175aa6c4c2be78bf66cf2 /docs/running-on-yarn.md
parent: 5d460253d6080d871cb71efb112ea17be0873771 (diff)
download: spark-f2fb4b422863059476816df07ca7ea18f62e3a9d.tar.gz
spark-f2fb4b422863059476816df07ca7ea18f62e3a9d.tar.bz2
spark-f2fb4b422863059476816df07ca7ea18f62e3a9d.zip
1 files changed, 8 insertions, 0 deletions
diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md
index 68fd6c2ab1..3ec656c469 100644
--- a/docs/running-on-yarn.md
+++ b/docs/running-on-yarn.md
@@ -17,6 +17,7 @@ This can be built by setting the Hadoop version and `SPARK_YARN` environment var
 The assembled JAR will be something like this:
 `./assembly/target/scala-{{site.SCALA_VERSION}}/spark-assembly_{{site.SPARK_VERSION}}-hadoop2.0.5.jar`.
 
+The build process now also supports new YARN versions (2.2.x). See below.
 
 # Preparations
 
@@ -111,9 +112,16 @@ For example:
     SPARK_YARN_APP_JAR=examples/target/scala-{{site.SCALA_VERSION}}/spark-examples-assembly-{{site.SPARK_VERSION}}.jar \
     MASTER=yarn-client ./spark-shell
 
+# Building Spark for Hadoop/YARN 2.2.x
+
+Hadoop 2.2.x users must build Spark and publish it locally. The SBT build process handles Hadoop 2.2.x as a special case. This version of Hadoop has new YARN API changes and depends on a Protobuf version (2.5) that is not compatible with the Akka version (2.0.5) that Spark uses. Therefore, if the Hadoop version (e.g. set through ```SPARK_HADOOP_VERSION```) starts with 2.2.0 or higher then the build process will depend on Akka artifacts distributed by the Spark project compatible with Protobuf 2.5. Furthermore, the build process then uses the directory ```new-yarn``` (stead of ```yarn```), which supports the new YARN API. The build process should seamlessly work out of the box. 
+
+See [Building Spark with Maven](building-with-maven.md) for instructions on how to build Spark using the Maven process.
+
 # Important Notes
 
 - We do not requesting container resources based on the number of cores. Thus the numbers of cores given via command line arguments cannot be guaranteed.
 - The local directories used for spark will be the local directories configured for YARN (Hadoop Yarn config yarn.nodemanager.local-dirs). If the user specifies spark.local.dir, it will be ignored.
 - The --files and --archives options support specifying file names with the # similar to Hadoop. For example you can specify: --files localtest.txt#appSees.txt and this will upload the file you have locally named localtest.txt into HDFS but this will be linked to by the name appSees.txt and your application should use the name as appSees.txt to reference it when running on YARN.
 - The --addJars option allows the SparkContext.addJar function to work if you are using it with local files. It does not need to be used if you are using it with HDFS, HTTP, HTTPS, or FTP files.
+- YARN 2.2.x users cannot simply depend on the Spark packages without building Spark, as the published Spark artifacts are compiled to work with the pre 2.2 API. Those users must build Spark and publish it locally.  
+\ No newline at end of file
author	Ali Ghodsi <alig@cs.berkeley.edu>	2013-12-06 00:43:12 -0800
committer	Ali Ghodsi <alig@cs.berkeley.edu>	2013-12-06 16:31:26 -0800
commit	f2fb4b422863059476816df07ca7ea18f62e3a9d (patch)
tree	670809f99ec2e614962175aa6c4c2be78bf66cf2 /docs/running-on-yarn.md
parent	5d460253d6080d871cb71efb112ea17be0873771 (diff)
download	spark-f2fb4b422863059476816df07ca7ea18f62e3a9d.tar.gz spark-f2fb4b422863059476816df07ca7ea18f62e3a9d.tar.bz2 spark-f2fb4b422863059476816df07ca7ea18f62e3a9d.zip