aboutsummaryrefslogtreecommitdiff
path: root/docs/building-spark.md
diff options
context:
space:
mode:
authorJacek Laskowski <jacek@japila.pl>2015-09-08 14:38:10 +0100
committerSean Owen <sowen@cloudera.com>2015-09-08 14:38:10 +0100
commit6ceed852ab716d8acc46ce90cba9cfcff6d3616f (patch)
treed893483e0fbb3601d4bde3aaf30a849b641ac24f /docs/building-spark.md
parent9d8e838d883ed21f9ef562e7e3ac074c7e4adb88 (diff)
downloadspark-6ceed852ab716d8acc46ce90cba9cfcff6d3616f.tar.gz
spark-6ceed852ab716d8acc46ce90cba9cfcff6d3616f.tar.bz2
spark-6ceed852ab716d8acc46ce90cba9cfcff6d3616f.zip
Docs small fixes
Author: Jacek Laskowski <jacek@japila.pl> Closes #8629 from jaceklaskowski/docs-fixes.
Diffstat (limited to 'docs/building-spark.md')
-rw-r--r--docs/building-spark.md23
1 files changed, 11 insertions, 12 deletions
diff --git a/docs/building-spark.md b/docs/building-spark.md
index f133eb96d9..4db32cfd62 100644
--- a/docs/building-spark.md
+++ b/docs/building-spark.md
@@ -61,12 +61,13 @@ If you don't run this, you may see errors like the following:
You can fix this by setting the `MAVEN_OPTS` variable as discussed before.
**Note:**
-* *For Java 8 and above this step is not required.*
-* *If using `build/mvn` and `MAVEN_OPTS` were not already set, the script will automate this for you.*
+
+* For Java 8 and above this step is not required.
+* If using `build/mvn` with no `MAVEN_OPTS` set, the script will automate this for you.
# Specifying the Hadoop Version
-Because HDFS is not protocol-compatible across versions, if you want to read from HDFS, you'll need to build Spark against the specific HDFS version in your environment. You can do this through the "hadoop.version" property. If unset, Spark will build against Hadoop 2.2.0 by default. Note that certain build profiles are required for particular Hadoop versions:
+Because HDFS is not protocol-compatible across versions, if you want to read from HDFS, you'll need to build Spark against the specific HDFS version in your environment. You can do this through the `hadoop.version` property. If unset, Spark will build against Hadoop 2.2.0 by default. Note that certain build profiles are required for particular Hadoop versions:
<table class="table">
<thead>
@@ -91,7 +92,7 @@ mvn -Dhadoop.version=1.2.1 -Phadoop-1 -DskipTests clean package
mvn -Dhadoop.version=2.0.0-mr1-cdh4.2.0 -Phadoop-1 -DskipTests clean package
{% endhighlight %}
-You can enable the "yarn" profile and optionally set the "yarn.version" property if it is different from "hadoop.version". Spark only supports YARN versions 2.2.0 and later.
+You can enable the `yarn` profile and optionally set the `yarn.version` property if it is different from `hadoop.version`. Spark only supports YARN versions 2.2.0 and later.
Examples:
@@ -125,7 +126,7 @@ mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -Phive-thriftserver -Dskip
# Building for Scala 2.11
To produce a Spark package compiled with Scala 2.11, use the `-Dscala-2.11` property:
- dev/change-scala-version.sh 2.11
+ ./dev/change-scala-version.sh 2.11
mvn -Pyarn -Phadoop-2.4 -Dscala-2.11 -DskipTests clean package
Spark does not yet support its JDBC component for Scala 2.11.
@@ -163,11 +164,9 @@ the `spark-parent` module).
Thus, the full flow for running continuous-compilation of the `core` submodule may look more like:
-```
- $ mvn install
- $ cd core
- $ mvn scala:cc
-```
+ $ mvn install
+ $ cd core
+ $ mvn scala:cc
# Building Spark with IntelliJ IDEA or Eclipse
@@ -193,11 +192,11 @@ then ship it over to the cluster. We are investigating the exact cause for this.
# Packaging without Hadoop Dependencies for YARN
-The assembly jar produced by `mvn package` will, by default, include all of Spark's dependencies, including Hadoop and some of its ecosystem projects. On YARN deployments, this causes multiple versions of these to appear on executor classpaths: the version packaged in the Spark assembly and the version on each node, included with yarn.application.classpath. The `hadoop-provided` profile builds the assembly without including Hadoop-ecosystem projects, like ZooKeeper and Hadoop itself.
+The assembly jar produced by `mvn package` will, by default, include all of Spark's dependencies, including Hadoop and some of its ecosystem projects. On YARN deployments, this causes multiple versions of these to appear on executor classpaths: the version packaged in the Spark assembly and the version on each node, included with `yarn.application.classpath`. The `hadoop-provided` profile builds the assembly without including Hadoop-ecosystem projects, like ZooKeeper and Hadoop itself.
# Building with SBT
-Maven is the official recommendation for packaging Spark, and is the "build of reference".
+Maven is the official build tool recommended for packaging Spark, and is the *build of reference*.
But SBT is supported for day-to-day development since it can provide much faster iterative
compilation. More advanced developers may wish to use SBT.