aboutsummaryrefslogtreecommitdiff
path: root/docs/building-spark.md
diff options
context:
space:
mode:
authorBrennon York <brennon.york@capitalone.com>2014-12-27 13:25:18 -0800
committerPatrick Wendell <pwendell@gmail.com>2014-12-27 13:26:38 -0800
commita3e51cc990812c8099dcaf1f3bd6d5bae45cf8e6 (patch)
treeb44d82a54f89d9a976a0ebfd3f59657538ddadb8 /docs/building-spark.md
parent080ceb771a1e6b9f844cfd4f1baa01133c106888 (diff)
downloadspark-a3e51cc990812c8099dcaf1f3bd6d5bae45cf8e6.tar.gz
spark-a3e51cc990812c8099dcaf1f3bd6d5bae45cf8e6.tar.bz2
spark-a3e51cc990812c8099dcaf1f3bd6d5bae45cf8e6.zip
[SPARK-4501][Core] - Create build/mvn to automatically download maven/zinc/scalac
Creates a top level directory script (as `build/mvn`) to automatically download zinc and the specific version of scala used to easily build spark. This will also download and install maven if the user doesn't already have it and all packages are hosted under the `build/` directory. Tested on both Linux and OSX OS's and both work. All commands pass through to the maven binary so it acts exactly as a traditional maven call would. Author: Brennon York <brennon.york@capitalone.com> Closes #3707 from brennonyork/SPARK-4501 and squashes the following commits: 0e5a0e4 [Brennon York] minor incorrect doc verbage (with -> this) 9b79e38 [Brennon York] fixed merge conflicts with dev/run-tests, properly quoted args in sbt/sbt, fixed bug where relative paths would fail if passed in from build/mvn d2d41b6 [Brennon York] added blurb about leverging zinc with build/mvn b979c58 [Brennon York] updated the merge conflict c5634de [Brennon York] updated documentation to overview build/mvn, updated all points where sbt/sbt was referenced with build/sbt b8437ba [Brennon York] set progress bars for curl and wget when not run on jenkins, no progress bar when run on jenkins, moved sbt script to build/sbt, wrote stub and warning under sbt/sbt which calls build/sbt, modified build/sbt to use the correct directory, fixed bug in build/sbt-launch-lib.bash to correctly pull the sbt version be11317 [Brennon York] added switch to silence download progress only if AMPLAB_JENKINS is set 28d0a99 [Brennon York] updated to remove the python dependency, uses grep instead 7e785a6 [Brennon York] added silent and quiet flags to curl and wget respectively, added single echo output to denote start of a download if download is needed 14a5da0 [Brennon York] removed unnecessary zinc output on startup 1af4a94 [Brennon York] fixed bug with uppercase vs lowercase variable 3e8b9b3 [Brennon York] updated to properly only restart zinc if it was freshly installed a680d12 [Brennon York] Added comments to functions and tested various mvn calls bb8cc9d [Brennon York] removed package files ef017e6 [Brennon York] removed OS complexities, setup generic install_app call, removed extra file complexities, removed help, removed forced install (defaults now), removed double-dash from cli 07bf018 [Brennon York] Updated to specifically handle pulling down the correct scala version f914dea [Brennon York] Beginning final portions of localized scala home 69c4e44 [Brennon York] working linux and osx installers for purely local mvn build 4a1609c [Brennon York] finalizing working linux install for maven to local ./build/apache-maven folder cbfcc68 [Brennon York] Changed the default sbt/sbt to build/sbt and added a build/mvn which will automatically download, install, and execute maven with zinc for easier build capability
Diffstat (limited to 'docs/building-spark.md')
-rw-r--r--docs/building-spark.md45
1 files changed, 31 insertions, 14 deletions
diff --git a/docs/building-spark.md b/docs/building-spark.md
index dab3d2aef4..c1bcd91b5b 100644
--- a/docs/building-spark.md
+++ b/docs/building-spark.md
@@ -9,6 +9,15 @@ redirect_from: "building-with-maven.html"
Building Spark using Maven requires Maven 3.0.4 or newer and Java 6+.
+# Building with `build/mvn`
+
+Spark now comes packaged with a self-contained Maven installation to ease building and deployment of Spark from source located under the `build/` directory. This script will automatically download and setup all necessary build requirements ([Maven](https://maven.apache.org/), [Scala](http://www.scala-lang.org/), and [Zinc](https://github.com/typesafehub/zinc)) locally within the `build/` directory itself. It honors any `mvn` binary if present already, however, will pull down its own copy of Scala and Zinc regardless to ensure proper version requirements are met. `build/mvn` execution acts as a pass through to the `mvn` call allowing easy transition from previous build methods. As an example, one can build a version of Spark as follows:
+
+{% highlight bash %}
+build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package
+{% endhighlight %}
+
+Other build examples can be found below.
# Setting up Maven's Memory Usage
@@ -28,7 +37,9 @@ If you don't run this, you may see errors like the following:
You can fix this by setting the `MAVEN_OPTS` variable as discussed before.
-**Note:** *For Java 8 and above this step is not required.*
+**Note:**
+* *For Java 8 and above this step is not required.*
+* *If using `build/mvn` and `MAVEN_OPTS` were not already set, the script will automate this for you.*
# Specifying the Hadoop Version
@@ -84,7 +95,7 @@ mvn -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Dyarn.version=2.2.0 -DskipTests
# Building With Hive and JDBC Support
To enable Hive integration for Spark SQL along with its JDBC server and CLI,
add the `-Phive` and `Phive-thriftserver` profiles to your existing build options.
-By default Spark will build with Hive 0.13.1 bindings. You can also build for
+By default Spark will build with Hive 0.13.1 bindings. You can also build for
Hive 0.12.0 using the `-Phive-0.12.0` profile.
{% highlight bash %}
# Apache Hadoop 2.4.X with Hive 13 support
@@ -106,7 +117,7 @@ supported in Scala 2.11 builds.
# Spark Tests in Maven
-Tests are run by default via the [ScalaTest Maven plugin](http://www.scalatest.org/user_guide/using_the_scalatest_maven_plugin).
+Tests are run by default via the [ScalaTest Maven plugin](http://www.scalatest.org/user_guide/using_the_scalatest_maven_plugin).
Some of the tests require Spark to be packaged first, so always run `mvn package` with `-DskipTests` the first time. The following is an example of a correct (build, test) sequence:
@@ -124,7 +135,7 @@ We use the scala-maven-plugin which supports incremental and continuous compilat
mvn scala:cc
-should run continuous compilation (i.e. wait for changes). However, this has not been tested
+should run continuous compilation (i.e. wait for changes). However, this has not been tested
extensively. A couple of gotchas to note:
* it only scans the paths `src/main` and `src/test` (see
[docs](http://scala-tools.org/mvnsites/maven-scala-plugin/usage_cc.html)), so it will only work
@@ -157,9 +168,9 @@ The debian package can then be found under assembly/target. We added the short c
Running only Java 8 tests and nothing else.
mvn install -DskipTests -Pjava8-tests
-
-Java 8 tests are run when `-Pjava8-tests` profile is enabled, they will run in spite of `-DskipTests`.
-For these tests to run your system must have a JDK 8 installation.
+
+Java 8 tests are run when `-Pjava8-tests` profile is enabled, they will run in spite of `-DskipTests`.
+For these tests to run your system must have a JDK 8 installation.
If you have JDK 8 installed but it is not the system default, you can set JAVA_HOME to point to JDK 8 before running the tests.
# Building for PySpark on YARN
@@ -171,7 +182,7 @@ then ship it over to the cluster. We are investigating the exact cause for this.
# Packaging without Hadoop Dependencies for YARN
-The assembly jar produced by `mvn package` will, by default, include all of Spark's dependencies, including Hadoop and some of its ecosystem projects. On YARN deployments, this causes multiple versions of these to appear on executor classpaths: the version packaged in the Spark assembly and the version on each node, included with yarn.application.classpath. The `hadoop-provided` profile builds the assembly without including Hadoop-ecosystem projects, like ZooKeeper and Hadoop itself.
+The assembly jar produced by `mvn package` will, by default, include all of Spark's dependencies, including Hadoop and some of its ecosystem projects. On YARN deployments, this causes multiple versions of these to appear on executor classpaths: the version packaged in the Spark assembly and the version on each node, included with yarn.application.classpath. The `hadoop-provided` profile builds the assembly without including Hadoop-ecosystem projects, like ZooKeeper and Hadoop itself.
# Building with SBT
@@ -182,22 +193,22 @@ compilation. More advanced developers may wish to use SBT.
The SBT build is derived from the Maven POM files, and so the same Maven profiles and variables
can be set to control the SBT build. For example:
- sbt/sbt -Pyarn -Phadoop-2.3 assembly
+ build/sbt -Pyarn -Phadoop-2.3 assembly
# Testing with SBT
-Some of the tests require Spark to be packaged first, so always run `sbt/sbt assembly` the first time. The following is an example of a correct (build, test) sequence:
+Some of the tests require Spark to be packaged first, so always run `build/sbt assembly` the first time. The following is an example of a correct (build, test) sequence:
- sbt/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-thriftserver assembly
- sbt/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-thriftserver test
+ build/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-thriftserver assembly
+ build/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-thriftserver test
To run only a specific test suite as follows:
- sbt/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-thriftserver "test-only org.apache.spark.repl.ReplSuite"
+ build/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-thriftserver "test-only org.apache.spark.repl.ReplSuite"
To run test suites of a specific sub project as follows:
- sbt/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-thriftserver core/test
+ build/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-thriftserver core/test
# Speeding up Compilation with Zinc
@@ -206,3 +217,9 @@ compiler. When run locally as a background process, it speeds up builds of Scala
like Spark. Developers who regularly recompile Spark with Maven will be the most interested in
Zinc. The project site gives instructions for building and running `zinc`; OS X users can
install it using `brew install zinc`.
+
+If using the `build/mvn` package `zinc` will automatically be downloaded and leveraged for all
+builds. This process will auto-start after the first time `build/mvn` is called and bind to port
+3030 unless the `ZINC_PORT` environment variable is set. The `zinc` process can subsequently be
+shut down at any time by running `build/zinc-<version>/bin/zinc -shutdown` and will automatically
+restart whenever `build/mvn` is called.