aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--CONTRIBUTING.md12
-rw-r--r--README.md78
-rw-r--r--docs/README.md5
-rw-r--r--docs/_config.yml4
-rwxr-xr-xdocs/_layouts/global.html2
-rw-r--r--docs/building-spark.md (renamed from docs/building-with-maven.md)20
-rw-r--r--docs/hadoop-third-party-distributions.md2
-rw-r--r--docs/index.md4
-rw-r--r--docs/running-on-yarn.md2
-rw-r--r--docs/streaming-kinesis-integration.md2
-rwxr-xr-xmake-distribution.sh2
11 files changed, 60 insertions, 73 deletions
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
new file mode 100644
index 0000000000..c6b4aa5344
--- /dev/null
+++ b/CONTRIBUTING.md
@@ -0,0 +1,12 @@
+## Contributing to Spark
+
+Contributions via GitHub pull requests are gladly accepted from their original
+author. Along with any pull requests, please state that the contribution is
+your original work and that you license the work to the project under the
+project's open source license. Whether or not you state this explicitly, by
+submitting any copyrighted material via pull request, email, or other means
+you agree to license the material under the project's open source license and
+warrant that you have the legal authority to do so.
+
+Please see [Contributing to Spark wiki page](https://cwiki.apache.org/SPARK/Contributing+to+Spark)
+for more information.
diff --git a/README.md b/README.md
index 5b09ad8684..b05bbfb5a5 100644
--- a/README.md
+++ b/README.md
@@ -13,16 +13,19 @@ and Spark Streaming for stream processing.
## Online Documentation
You can find the latest Spark documentation, including a programming
-guide, on the project webpage at <http://spark.apache.org/documentation.html>.
+guide, on the [project web page](http://spark.apache.org/documentation.html).
This README file only contains basic setup instructions.
## Building Spark
-Spark is built on Scala 2.10. To build Spark and its example programs, run:
+Spark is built using [Apache Maven](http://maven.apache.org/).
+To build Spark and its example programs, run:
- ./sbt/sbt assembly
+ mvn -DskipTests clean package
(You do not need to do this if you downloaded a pre-built package.)
+More detailed documentation is available from the project site, at
+["Building Spark"](http://spark.apache.org/docs/latest/building-spark.html).
## Interactive Scala Shell
@@ -71,73 +74,24 @@ can be run using:
./dev/run-tests
+Please see the guidance on how to
+[run all automated tests](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-AutomatedTesting)
+
## A Note About Hadoop Versions
Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported
storage systems. Because the protocols have changed in different versions of
Hadoop, you must build Spark against the same version that your cluster runs.
-You can change the version by setting `-Dhadoop.version` when building Spark.
-
-For Apache Hadoop versions 1.x, Cloudera CDH MRv1, and other Hadoop
-versions without YARN, use:
-
- # Apache Hadoop 1.2.1
- $ sbt/sbt -Dhadoop.version=1.2.1 assembly
-
- # Cloudera CDH 4.2.0 with MapReduce v1
- $ sbt/sbt -Dhadoop.version=2.0.0-mr1-cdh4.2.0 assembly
-
-For Apache Hadoop 2.2.X, 2.1.X, 2.0.X, 0.23.x, Cloudera CDH MRv2, and other Hadoop versions
-with YARN, also set `-Pyarn`:
-
- # Apache Hadoop 2.0.5-alpha
- $ sbt/sbt -Dhadoop.version=2.0.5-alpha -Pyarn assembly
-
- # Cloudera CDH 4.2.0 with MapReduce v2
- $ sbt/sbt -Dhadoop.version=2.0.0-cdh4.2.0 -Pyarn assembly
-
- # Apache Hadoop 2.2.X and newer
- $ sbt/sbt -Dhadoop.version=2.2.0 -Pyarn assembly
-
-When developing a Spark application, specify the Hadoop version by adding the
-"hadoop-client" artifact to your project's dependencies. For example, if you're
-using Hadoop 1.2.1 and build your application using SBT, add this entry to
-`libraryDependencies`:
-
- "org.apache.hadoop" % "hadoop-client" % "1.2.1"
-If your project is built with Maven, add this to your POM file's `<dependencies>` section:
-
- <dependency>
- <groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-client</artifactId>
- <version>1.2.1</version>
- </dependency>
-
-
-## A Note About Thrift JDBC server and CLI for Spark SQL
-
-Spark SQL supports Thrift JDBC server and CLI.
-See sql-programming-guide.md for more information about using the JDBC server and CLI.
-You can use those features by setting `-Phive` when building Spark as follows.
-
- $ sbt/sbt -Phive assembly
+Please refer to the build documentation at
+["Specifying the Hadoop Version"](http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version)
+for detailed guidance on building for a particular distribution of Hadoop, including
+building for particular Hive and Hive Thriftserver distributions. See also
+["Third Party Hadoop Distributions"](http://spark.apache.org/docs/latest/hadoop-third-party-distributions.html)
+for guidance on building a Spark application that works with a particular
+distribution.
## Configuration
Please refer to the [Configuration guide](http://spark.apache.org/docs/latest/configuration.html)
in the online documentation for an overview on how to configure Spark.
-
-
-## Contributing to Spark
-
-Contributions via GitHub pull requests are gladly accepted from their original
-author. Along with any pull requests, please state that the contribution is
-your original work and that you license the work to the project under the
-project's open source license. Whether or not you state this explicitly, by
-submitting any copyrighted material via pull request, email, or other means
-you agree to license the material under the project's open source license and
-warrant that you have the legal authority to do so.
-
-Please see [Contributing to Spark wiki page](https://cwiki.apache.org/SPARK/Contributing+to+Spark)
-for more information.
diff --git a/docs/README.md b/docs/README.md
index 0a0126c574..fdc89d2eb7 100644
--- a/docs/README.md
+++ b/docs/README.md
@@ -23,8 +23,9 @@ The markdown code can be compiled to HTML using the [Jekyll tool](http://jekyllr
To use the `jekyll` command, you will need to have Jekyll installed.
The easiest way to do this is via a Ruby Gem, see the
[jekyll installation instructions](http://jekyllrb.com/docs/installation).
-If not already installed, you need to install `kramdown` with `sudo gem install kramdown`.
-Execute `jekyll` from the `docs/` directory. Compiling the site with Jekyll will create a directory
+If not already installed, you need to install `kramdown` and `jekyll-redirect-from` Gems
+with `sudo gem install kramdown jekyll-redirect-from`.
+Execute `jekyll build` from the `docs/` directory. Compiling the site with Jekyll will create a directory
called `_site` containing index.html as well as the rest of the compiled files.
You can modify the default Jekyll build as follows:
diff --git a/docs/_config.yml b/docs/_config.yml
index 45b78fe724..d3ea2625c7 100644
--- a/docs/_config.yml
+++ b/docs/_config.yml
@@ -1,5 +1,7 @@
-pygments: true
+highlighter: pygments
markdown: kramdown
+gems:
+ - jekyll-redirect-from
# These allow the documentation to be updated with nerw releases
# of Spark, Scala, and Mesos.
diff --git a/docs/_layouts/global.html b/docs/_layouts/global.html
index b30ab1e521..a53e8a775b 100755
--- a/docs/_layouts/global.html
+++ b/docs/_layouts/global.html
@@ -109,7 +109,7 @@
<li><a href="hardware-provisioning.html">Hardware Provisioning</a></li>
<li><a href="hadoop-third-party-distributions.html">3<sup>rd</sup>-Party Hadoop Distros</a></li>
<li class="divider"></li>
- <li><a href="building-with-maven.html">Building Spark with Maven</a></li>
+ <li><a href="building-spark.html">Building Spark</a></li>
<li><a href="https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark">Contributing to Spark</a></li>
</ul>
</li>
diff --git a/docs/building-with-maven.md b/docs/building-spark.md
index bce7412c7d..2378092d4a 100644
--- a/docs/building-with-maven.md
+++ b/docs/building-spark.md
@@ -1,6 +1,7 @@
---
layout: global
-title: Building Spark with Maven
+title: Building Spark
+redirect_from: "building-with-maven.html"
---
* This will become a table of contents (this text will be scraped).
@@ -159,4 +160,21 @@ then ship it over to the cluster. We are investigating the exact cause for this.
The assembly jar produced by `mvn package` will, by default, include all of Spark's dependencies, including Hadoop and some of its ecosystem projects. On YARN deployments, this causes multiple versions of these to appear on executor classpaths: the version packaged in the Spark assembly and the version on each node, included with yarn.application.classpath. The `hadoop-provided` profile builds the assembly without including Hadoop-ecosystem projects, like ZooKeeper and Hadoop itself.
+# Building with SBT
+Maven is the official recommendation for packaging Spark, and is the "build of reference".
+But SBT is supported for day-to-day development since it can provide much faster iterative
+compilation. More advanced developers may wish to use SBT.
+
+The SBT build is derived from the Maven POM files, and so the same Maven profiles and variables
+can be set to control the SBT build. For example:
+
+ sbt/sbt -Pyarn -Phadoop-2.3 compile
+
+# Speeding up Compilation with Zinc
+
+[Zinc](https://github.com/typesafehub/zinc) is a long-running server version of SBT's incremental
+compiler. When run locally as a background process, it speeds up builds of Scala-based projects
+like Spark. Developers who regularly recompile Spark with Maven will be the most interested in
+Zinc. The project site gives instructions for building and running `zinc`; OS X users can
+install it using `brew install zinc`. \ No newline at end of file
diff --git a/docs/hadoop-third-party-distributions.md b/docs/hadoop-third-party-distributions.md
index ab1023b8f1..dd73e9dc54 100644
--- a/docs/hadoop-third-party-distributions.md
+++ b/docs/hadoop-third-party-distributions.md
@@ -11,7 +11,7 @@ with these distributions:
When compiling Spark, you'll need to specify the Hadoop version by defining the `hadoop.version`
property. For certain versions, you will need to specify additional profiles. For more detail,
-see the guide on [building with maven](building-with-maven.html#specifying-the-hadoop-version):
+see the guide on [building with maven](building-spark.html#specifying-the-hadoop-version):
mvn -Dhadoop.version=1.0.4 -DskipTests clean package
mvn -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests clean package
diff --git a/docs/index.md b/docs/index.md
index 7fe6b43d32..e8ebadbd4e 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -12,7 +12,7 @@ It also supports a rich set of higher-level tools including [Spark SQL](sql-prog
Get Spark from the [downloads page](http://spark.apache.org/downloads.html) of the project website. This documentation is for Spark version {{site.SPARK_VERSION}}. The downloads page
contains Spark packages for many popular HDFS versions. If you'd like to build Spark from
-scratch, visit [building Spark with Maven](building-with-maven.html).
+scratch, visit [Building Spark](building-spark.html).
Spark runs on both Windows and UNIX-like systems (e.g. Linux, Mac OS). It's easy to run
locally on one machine --- all you need is to have `java` installed on your system `PATH`,
@@ -105,7 +105,7 @@ options for deployment:
* [3<sup>rd</sup> Party Hadoop Distributions](hadoop-third-party-distributions.html): using common Hadoop distributions
* Integration with other storage systems:
* [OpenStack Swift](storage-openstack-swift.html)
-* [Building Spark with Maven](building-with-maven.html): build Spark using the Maven system
+* [Building Spark](building-spark.html): build Spark using the Maven system
* [Contributing to Spark](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark)
**External Resources:**
diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md
index 212248bcce..74bcc2eeb6 100644
--- a/docs/running-on-yarn.md
+++ b/docs/running-on-yarn.md
@@ -11,7 +11,7 @@ was added to Spark in version 0.6.0, and improved in subsequent releases.
Running Spark-on-YARN requires a binary distribution of Spark which is built with YARN support.
Binary distributions can be downloaded from the Spark project website.
-To build Spark yourself, refer to the [building with Maven guide](building-with-maven.html).
+To build Spark yourself, refer to [Building Spark](building-spark.html).
# Configuration
diff --git a/docs/streaming-kinesis-integration.md b/docs/streaming-kinesis-integration.md
index c6090d9ec3..379eb513d5 100644
--- a/docs/streaming-kinesis-integration.md
+++ b/docs/streaming-kinesis-integration.md
@@ -108,7 +108,7 @@ A Kinesis stream can be set up at one of the valid Kinesis endpoints with 1 or m
#### Running the Example
To run the example,
-- Download Spark source and follow the [instructions](building-with-maven.html) to build Spark with profile *-Pkinesis-asl*.
+- Download Spark source and follow the [instructions](building-spark.html) to build Spark with profile *-Pkinesis-asl*.
mvn -Pkinesis-asl -DskipTests clean package
diff --git a/make-distribution.sh b/make-distribution.sh
index 9b012b9222..884659954a 100755
--- a/make-distribution.sh
+++ b/make-distribution.sh
@@ -40,7 +40,7 @@ function exit_with_usage {
echo ""
echo "usage:"
echo "./make-distribution.sh [--name] [--tgz] [--with-tachyon] <maven build options>"
- echo "See Spark's \"Building with Maven\" doc for correct Maven options."
+ echo "See Spark's \"Building Spark\" doc for correct Maven options."
echo ""
exit 1
}