Update Maven docs

author: Matei Zaharia <matei@eecs.berkeley.edu> 2013-08-27 20:02:30 -0700
committer: Matei Zaharia <matei@eecs.berkeley.edu> 2013-08-29 21:19:07 -0700
commit: e11bc18294d9e3f2ea155f5398faf3fb08aa2a59 (patch)
tree: 1d75a8bc007a8f66a77f3349c88ca30506211054 /docs/building-with-maven.md
parent: d8a4008685996756c3fc871a5a196a60d8e8989c (diff)
download: spark-e11bc18294d9e3f2ea155f5398faf3fb08aa2a59.tar.gz
spark-e11bc18294d9e3f2ea155f5398faf3fb08aa2a59.tar.bz2
spark-e11bc18294d9e3f2ea155f5398faf3fb08aa2a59.zip
1 files changed, 26 insertions, 29 deletions
diff --git a/docs/building-with-maven.md b/docs/building-with-maven.md
index 72d37fec0a..7ecb601ddd 100644
--- a/docs/building-with-maven.md
+++ b/docs/building-with-maven.md
@@ -8,53 +8,51 @@ title: Building Spark with Maven
 
 Building Spark using Maven Requires Maven 3 (the build process is tested with Maven 3.0.4) and Java 1.6 or newer.
 
-## Specifying the Hadoop version ##
 
-To enable support for HDFS and other Hadoop-supported storage systems, specify the exact Hadoop version by setting the "hadoop.version" property. If unset, Spark will build against Hadoop 1.0.4 by default.
+## Setting up Maven's Memory Usage ##
 
-For Apache Hadoop versions 1.x, Cloudera CDH MRv1, and other Hadoop versions without YARN, use:
+You'll need to configure Maven to use more memory than usual by setting `MAVEN_OPTS`. We recommend the following settings:
 
-    # Apache Hadoop 1.2.1
-    $ mvn -Dhadoop.version=1.2.1 clean package
+    export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
 
-    # Cloudera CDH 4.2.0 with MapReduce v1
-    $ mvn -Dhadoop.version=2.0.0-mr1-cdh4.2.0 clean package
+If you don't run this, you may see errors like the following:
 
-For Apache Hadoop 2.x, 0.23.x, Cloudera CDH MRv2, and other Hadoop versions with YARN, enable the "hadoop2-yarn" profile:
-
-    # Apache Hadoop 2.0.5-alpha
-    $ mvn -Phadoop2-yarn -Dhadoop.version=2.0.5-alpha clean package
+    [INFO] Compiling 203 Scala sources and 9 Java sources to /Users/me/Development/spark/core/target/scala-{{site.SCALA_VERSION}}/classes...
+    [ERROR] PermGen space -> [Help 1]
 
-    # Cloudera CDH 4.2.0 with MapReduce v2
-    $ mvn -Phadoop2-yarn -Dhadoop.version=2.0.0-cdh4.2.0 clean package
+    [INFO] Compiling 203 Scala sources and 9 Java sources to /Users/me/Development/spark/core/target/scala-{{site.SCALA_VERSION}}/classes...
+    [ERROR] Java heap space -> [Help 1]
 
+You can fix this by setting the `MAVEN_OPTS` variable as discussed before.
 
-## Spark Tests in Maven ##
+## Specifying the Hadoop version ##
 
-Tests are run by default via the scalatest-maven-plugin. With this you can do things like:
+Because HDFS is not protocol-compatible across versions, if you want to read from HDFS, you'll need to build Spark against the specific HDFS version in your environment. You can do this through the "hadoop.version" property. If unset, Spark will build against Hadoop 1.0.4 by default.
 
-Skip test execution (but not compilation):
+For Apache Hadoop versions 1.x, Cloudera CDH MRv1, and other Hadoop versions without YARN, use:
 
-    $ mvn -Dhadoop.version=... -DskipTests clean package
+    # Apache Hadoop 1.2.1
+    $ mvn -Dhadoop.version=1.2.1 -DskipTests clean package
 
-To run a specific test suite:
+    # Cloudera CDH 4.2.0 with MapReduce v1
+    $ mvn -Dhadoop.version=2.0.0-mr1-cdh4.2.0 -DskipTests clean package
 
-    $ mvn -Dhadoop.version=... -Dsuites=spark.repl.ReplSuite test
+For Apache Hadoop 2.x, 0.23.x, Cloudera CDH MRv2, and other Hadoop versions with YARN, you should also enable the "hadoop2-yarn" profile:
 
+    # Apache Hadoop 2.0.5-alpha
+    $ mvn -Phadoop2-yarn -Dhadoop.version=2.0.5-alpha -DskipTests clean package
 
-## Setting up JVM Memory Usage Via Maven ##
+    # Cloudera CDH 4.2.0 with MapReduce v2
+    $ mvn -Phadoop2-yarn -Dhadoop.version=2.0.0-cdh4.2.0 -DskipTests clean package
 
-You might run into the following errors if you're using a vanilla installation of Maven:
 
-    [INFO] Compiling 203 Scala sources and 9 Java sources to /Users/me/Development/spark/core/target/scala-{{site.SCALA_VERSION}}/classes...
-    [ERROR] PermGen space -> [Help 1]
+## Spark Tests in Maven ##
 
-    [INFO] Compiling 203 Scala sources and 9 Java sources to /Users/me/Development/spark/core/target/scala-{{site.SCALA_VERSION}}/classes...
-    [ERROR] Java heap space -> [Help 1]
+Tests are run by default via the [ScalaTest Maven plugin](http://www.scalatest.org/user_guide/using_the_scalatest_maven_plugin). Some of the require Spark to be packaged first, so always run `mvn package` with `-DskipTests` the first time. You can then run the tests with `mvn -Dhadoop.version=... test`.
 
-To fix these, you can do the following:
+The ScalaTest plugin also supports running only a specific test suite as follows:
 
-    export MAVEN_OPTS="-Xmx1024m -XX:MaxPermSize=128M"
+    $ mvn -Dhadoop.version=... -Dsuites=spark.repl.ReplSuite test
 
 
 ## Continuous Compilation ##
@@ -63,8 +61,7 @@ We use the scala-maven-plugin which supports incremental and continuous compilat
 
     $ mvn scala:cc
 
-…should run continuous compilation (i.e. wait for changes). However, this has not been tested extensively.
-
+should run continuous compilation (i.e. wait for changes). However, this has not been tested extensively.
 
 ## Using With IntelliJ IDEA ##
author	Matei Zaharia <matei@eecs.berkeley.edu>	2013-08-27 20:02:30 -0700
committer	Matei Zaharia <matei@eecs.berkeley.edu>	2013-08-29 21:19:07 -0700
commit	e11bc18294d9e3f2ea155f5398faf3fb08aa2a59 (patch)
tree	1d75a8bc007a8f66a77f3349c88ca30506211054 /docs/building-with-maven.md
parent	d8a4008685996756c3fc871a5a196a60d8e8989c (diff)
download	spark-e11bc18294d9e3f2ea155f5398faf3fb08aa2a59.tar.gz spark-e11bc18294d9e3f2ea155f5398faf3fb08aa2a59.tar.bz2 spark-e11bc18294d9e3f2ea155f5398faf3fb08aa2a59.zip