Merge pull request #857 from mateiz/assembly

Change build and run instructions to use assemblies
author: Matei Zaharia <matei.zaharia@gmail.com> 2013-08-29 21:51:14 -0700
committer: Matei Zaharia <matei.zaharia@gmail.com> 2013-08-29 21:51:14 -0700
commit: ca716209507e4870fbbf55d96ecd57c218d547ac (patch)
tree: 6514f95de349f76ad1b775885a1578ccf466805c /docs/building-with-maven.md
parent: 15287766281195a019a400fe11b41e96c6edc362 (diff)
parent: e11bc18294d9e3f2ea155f5398faf3fb08aa2a59 (diff)
download: spark-ca716209507e4870fbbf55d96ecd57c218d547ac.tar.gz
spark-ca716209507e4870fbbf55d96ecd57c218d547ac.tar.bz2
spark-ca716209507e4870fbbf55d96ecd57c218d547ac.zip
1 files changed, 28 insertions, 31 deletions
diff --git a/docs/building-with-maven.md b/docs/building-with-maven.md
index a9f2cb8a7a..7ecb601ddd 100644
--- a/docs/building-with-maven.md
+++ b/docs/building-with-maven.md
@@ -8,53 +8,51 @@ title: Building Spark with Maven
 
 Building Spark using Maven Requires Maven 3 (the build process is tested with Maven 3.0.4) and Java 1.6 or newer.
 
-## Specifying the Hadoop version ##
 
-To enable support for HDFS and other Hadoop-supported storage systems, specify the exact Hadoop version by setting the "hadoop.version" property. If unset, Spark will build against Hadoop 1.0.4 by default.
+## Setting up Maven's Memory Usage ##
 
-For Apache Hadoop versions 1.x, Cloudera CDH MRv1, and other Hadoop versions without YARN, use:
+You'll need to configure Maven to use more memory than usual by setting `MAVEN_OPTS`. We recommend the following settings:
 
-    # Apache Hadoop 1.2.1
-    $ mvn -Dhadoop.version=1.2.1 clean install
+    export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
 
-    # Cloudera CDH 4.2.0 with MapReduce v1
-    $ mvn -Dhadoop.version=2.0.0-mr1-cdh4.2.0 clean install
+If you don't run this, you may see errors like the following:
 
-For Apache Hadoop 2.x, 0.23.x, Cloudera CDH MRv2, and other Hadoop versions with YARN, enable the "hadoop2-yarn" profile:
-
-    # Apache Hadoop 2.0.5-alpha
-    $ mvn -Phadoop2-yarn -Dhadoop.version=2.0.5-alpha clean install
+    [INFO] Compiling 203 Scala sources and 9 Java sources to /Users/me/Development/spark/core/target/scala-{{site.SCALA_VERSION}}/classes...
+    [ERROR] PermGen space -> [Help 1]
 
-    # Cloudera CDH 4.2.0 with MapReduce v2
-    $ mvn -Phadoop2-yarn -Dhadoop.version=2.0.0-cdh4.2.0 clean install
+    [INFO] Compiling 203 Scala sources and 9 Java sources to /Users/me/Development/spark/core/target/scala-{{site.SCALA_VERSION}}/classes...
+    [ERROR] Java heap space -> [Help 1]
 
+You can fix this by setting the `MAVEN_OPTS` variable as discussed before.
 
-## Spark Tests in Maven ##
+## Specifying the Hadoop version ##
 
-Tests are run by default via the scalatest-maven-plugin. With this you can do things like:
+Because HDFS is not protocol-compatible across versions, if you want to read from HDFS, you'll need to build Spark against the specific HDFS version in your environment. You can do this through the "hadoop.version" property. If unset, Spark will build against Hadoop 1.0.4 by default.
 
-Skip test execution (but not compilation):
+For Apache Hadoop versions 1.x, Cloudera CDH MRv1, and other Hadoop versions without YARN, use:
 
-    $ mvn -Dhadoop.version=... -DskipTests clean install
+    # Apache Hadoop 1.2.1
+    $ mvn -Dhadoop.version=1.2.1 -DskipTests clean package
 
-To run a specific test suite:
+    # Cloudera CDH 4.2.0 with MapReduce v1
+    $ mvn -Dhadoop.version=2.0.0-mr1-cdh4.2.0 -DskipTests clean package
 
-    $ mvn -Dhadoop.version=... -Dsuites=spark.repl.ReplSuite test
+For Apache Hadoop 2.x, 0.23.x, Cloudera CDH MRv2, and other Hadoop versions with YARN, you should also enable the "hadoop2-yarn" profile:
 
+    # Apache Hadoop 2.0.5-alpha
+    $ mvn -Phadoop2-yarn -Dhadoop.version=2.0.5-alpha -DskipTests clean package
 
-## Setting up JVM Memory Usage Via Maven ##
+    # Cloudera CDH 4.2.0 with MapReduce v2
+    $ mvn -Phadoop2-yarn -Dhadoop.version=2.0.0-cdh4.2.0 -DskipTests clean package
 
-You might run into the following errors if you're using a vanilla installation of Maven:
 
-    [INFO] Compiling 203 Scala sources and 9 Java sources to /Users/me/Development/spark/core/target/scala-{{site.SCALA_VERSION}}/classes...
-    [ERROR] PermGen space -> [Help 1]
+## Spark Tests in Maven ##
 
-    [INFO] Compiling 203 Scala sources and 9 Java sources to /Users/me/Development/spark/core/target/scala-{{site.SCALA_VERSION}}/classes...
-    [ERROR] Java heap space -> [Help 1]
+Tests are run by default via the [ScalaTest Maven plugin](http://www.scalatest.org/user_guide/using_the_scalatest_maven_plugin). Some of the require Spark to be packaged first, so always run `mvn package` with `-DskipTests` the first time. You can then run the tests with `mvn -Dhadoop.version=... test`.
 
-To fix these, you can do the following:
+The ScalaTest plugin also supports running only a specific test suite as follows:
 
-    export MAVEN_OPTS="-Xmx1024m -XX:MaxPermSize=128M"
+    $ mvn -Dhadoop.version=... -Dsuites=spark.repl.ReplSuite test
 
 
 ## Continuous Compilation ##
@@ -63,8 +61,7 @@ We use the scala-maven-plugin which supports incremental and continuous compilat
 
     $ mvn scala:cc
 
-…should run continuous compilation (i.e. wait for changes). However, this has not been tested extensively.
-
+should run continuous compilation (i.e. wait for changes). However, this has not been tested extensively.
 
 ## Using With IntelliJ IDEA ##
 
@@ -72,8 +69,8 @@ This setup works fine in IntelliJ IDEA 11.1.4. After opening the project via the
 
 ## Building Spark Debian Packages ##
 
-It includes support for building a Debian package containing a 'fat-jar' which includes the repl, the examples and bagel. This can be created by specifying the deb profile:
+It includes support for building a Debian package containing a 'fat-jar' which includes the repl, the examples and bagel. This can be created by specifying the following profiles:
 
-    $ mvn -Pdeb clean install
+    $ mvn -Prepl-bin -Pdeb clean package
 
 The debian package can then be found under repl/target. We added the short commit hash to the file name so that we can distinguish individual packages build for SNAPSHOT versions.
author	Matei Zaharia <matei.zaharia@gmail.com>	2013-08-29 21:51:14 -0700
committer	Matei Zaharia <matei.zaharia@gmail.com>	2013-08-29 21:51:14 -0700
commit	ca716209507e4870fbbf55d96ecd57c218d547ac (patch)
tree	6514f95de349f76ad1b775885a1578ccf466805c /docs/building-with-maven.md
parent	15287766281195a019a400fe11b41e96c6edc362 (diff)
parent	e11bc18294d9e3f2ea155f5398faf3fb08aa2a59 (diff)
download	spark-ca716209507e4870fbbf55d96ecd57c218d547ac.tar.gz spark-ca716209507e4870fbbf55d96ecd57c218d547ac.tar.bz2 spark-ca716209507e4870fbbf55d96ecd57c218d547ac.zip