Update build docs

author: Jey Kottalam <jey@cs.berkeley.edu> 2013-08-21 14:51:56 -0700
committer: Jey Kottalam <jey@cs.berkeley.edu> 2013-08-21 14:51:56 -0700
commit: 6585f49841ada637b0811e0aadcf93132fff7001 (patch)
tree: 1ad14c8bb54de48a96bfe6882436e9eb3d4175fb /README.md
parent: 66e7a38a3229eeb6d980193048ebebcda1522acb (diff)
download: spark-6585f49841ada637b0811e0aadcf93132fff7001.tar.gz
spark-6585f49841ada637b0811e0aadcf93132fff7001.tar.bz2
spark-6585f49841ada637b0811e0aadcf93132fff7001.zip
1 files changed, 42 insertions, 4 deletions
diff --git a/README.md b/README.md
index 1dd96a0a4a..1e388ff380 100644
--- a/README.md
+++ b/README.md
@@ -16,7 +16,7 @@ Spark requires Scala 2.9.3 (Scala 2.10 is not yet supported). The project is
 built using Simple Build Tool (SBT), which is packaged with it. To build
 Spark and its example programs, run:
 
-    sbt/sbt package
+    sbt/sbt package assembly
 
 Spark also supports building using Maven. If you would like to build using Maven,
 see the [instructions for building Spark with Maven](http://spark-project.org/docs/latest/building-with-maven.html)
@@ -43,10 +43,48 @@ locally with one thread, or "local[N]" to run locally with N threads.
 ## A Note About Hadoop Versions
 
 Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported
-storage systems. Because the HDFS API has changed in different versions of
+storage systems. Because the protocols have changed in different versions of
 Hadoop, you must build Spark against the same version that your cluster runs.
-You can change the version by setting the `HADOOP_VERSION` variable at the top
-of `project/SparkBuild.scala`, then rebuilding Spark.
+You can change the version by setting the `SPARK_HADOOP_VERSION` environment
+when building Spark.
+
+For Apache Hadoop versions 1.x, 0.20.x, Cloudera CDH MRv1, and other Hadoop
+versions without YARN, use:
+
+    # Apache Hadoop 1.2.1
+    $ SPARK_HADOOP_VERSION=1.2.1 sbt/sbt package assembly
+
+    # Cloudera CDH 4.2.0 with MapReduce v1
+    $ SPARK_HADOOP_VERSION=2.0.0-mr1-cdh4.2.0 sbt/sbt package assembly
+
+For Apache Hadoop 2.x, 0.23.x, Cloudera CDH MRv2, and other Hadoop versions
+with YARN, also set `SPARK_WITH_YARN=true`:
+
+    # Apache Hadoop 2.0.5-alpha
+    $ SPARK_HADOOP_VERSION=2.0.5-alpha SPARK_WITH_YARN=true sbt/sbt package assembly
+
+    # Cloudera CDH 4.2.0 with MapReduce v2
+    $ SPARK_HADOOP_VERSION=2.0.0-cdh4.2.0 SPARK_WITH_YARN=true sbt/sbt package assembly
+
+For convenience, these variables may also be set through the `conf/spark-env.sh` file
+described below.
+
+When developing a Spark application, specify the Hadoop version by adding the
+"hadoop-client" artifact to your project's dependencies. For example, if you're
+using Hadoop 0.23.9 and build your application using SBT, add this to
+`libraryDependencies`:
+
+    // "force()" is required because "0.23.9" is less than Spark's default of "1.0.4"
+    "org.apache.hadoop" % "hadoop-client" % "0.23.9" force()
+
+If your project is built with Maven, add this to your POM file's `<dependencies>` section:
+
+    <dependency>
+      <groupId>org.apache.hadoop</groupId>
+      <artifactId>hadoop-client</artifactId>
+      <!-- the brackets are needed to tell Maven that this is a hard dependency on version "0.23.9" exactly -->
+      <version>[0.23.9]</version>
+    </dependency>
 
 
 ## Configuration
author	Jey Kottalam <jey@cs.berkeley.edu>	2013-08-21 14:51:56 -0700
committer	Jey Kottalam <jey@cs.berkeley.edu>	2013-08-21 14:51:56 -0700
commit	6585f49841ada637b0811e0aadcf93132fff7001 (patch)
tree	1ad14c8bb54de48a96bfe6882436e9eb3d4175fb /README.md
parent	66e7a38a3229eeb6d980193048ebebcda1522acb (diff)
download	spark-6585f49841ada637b0811e0aadcf93132fff7001.tar.gz spark-6585f49841ada637b0811e0aadcf93132fff7001.tar.bz2 spark-6585f49841ada637b0811e0aadcf93132fff7001.zip