aboutsummaryrefslogtreecommitdiff
path: root/docs/index.md
diff options
context:
space:
mode:
authorMatei Zaharia <matei@eecs.berkeley.edu>2012-09-25 15:46:18 -0700
committerMatei Zaharia <matei@eecs.berkeley.edu>2012-09-25 15:46:18 -0700
commite47e11720f7f51d720f98ad7dce959ac3ceef731 (patch)
treeb93ebc673b2f0ff2da61f1cc38b4925cfe76310e /docs/index.md
parent30362a21e7e181c058a4f5979bc85f3b18fb5e70 (diff)
downloadspark-e47e11720f7f51d720f98ad7dce959ac3ceef731.tar.gz
spark-e47e11720f7f51d720f98ad7dce959ac3ceef731.tar.bz2
spark-e47e11720f7f51d720f98ad7dce959ac3ceef731.zip
Documentation updates
Diffstat (limited to 'docs/index.md')
-rw-r--r--docs/index.md41
1 files changed, 21 insertions, 20 deletions
diff --git a/docs/index.md b/docs/index.md
index 69d55e505e..b9d9a8a36f 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -7,7 +7,7 @@ title: Spark Overview
TODO(andyk): Rewrite to make the Java API a first class part of the story.
{% endcomment %}
-Spark is a MapReduce-like cluster computing framework designed to support low-latency iterative jobs and interactive use from an interpreter. It is written in [Scala](http://www.scala-lang.org), a high-level language for the JVM, and exposes a clean language-integrated syntax that makes it easy to write parallel jobs. Spark runs on top of the [Apache Mesos](http://incubator.apache.org/mesos/) cluster manager, Hadoop YARN, or without an independent resource manager (i.e., in "standalone mode").
+Spark is a MapReduce-like cluster computing framework designed to support low-latency iterative jobs and interactive use from an interpreter. It exposes clean language-integrated APIs in [Scala](http://www.scala-lang.org) and Java, providing a wide array of parallel operations. Spark can run on top of the [Apache Mesos](http://incubator.apache.org/mesos/) cluster manager, Hadoop YARN, or without an independent resource manager ("standalone mode").
# Downloading
@@ -15,19 +15,14 @@ Get Spark by checking out the master branch of the Git repository, using `git cl
# Building
-Spark requires [Scala 2.9](http://www.scala-lang.org/).
-In addition, to run Spark on a cluster, you will need to install [Mesos](http://incubator.apache.org/mesos/), using the steps in
-[Running Spark on Mesos]({{HOME_PATH}}running-on-mesos.html). However, if you just want to run Spark on a single machine (possibly using multiple cores),
-you do not need Mesos.
-
-To build and run Spark, you will need to have Scala's `bin` directory in your `PATH`,
+Spark requires [Scala 2.9.2](http://www.scala-lang.org/). You will need to have Scala's `bin` directory in your `PATH`,
or you will need to set the `SCALA_HOME` environment variable to point
to where you've installed Scala. Scala must be accessible through one
-of these methods on Mesos slave nodes as well as on the master.
+of these methods on slave nodes on your cluster.
Spark uses [Simple Build Tool](https://github.com/harrah/xsbt/wiki), which is bundled with it. To compile the code, go into the top-level Spark directory and run
- sbt/sbt compile
+ sbt/sbt package
# Testing the Build
@@ -44,7 +39,7 @@ thread, or `local[N]` to run locally with N threads. You should start by using `
Finally, Spark can be used interactively from a modified version of the Scala interpreter that you can start through
`./spark-shell`. This is a great way to learn Spark.
-# A Note About Hadoop
+# A Note About Hadoop Versions
Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported
storage systems. Because the HDFS protocol has changed in different versions of
@@ -54,23 +49,29 @@ of `project/SparkBuild.scala`, then rebuilding Spark (`sbt/sbt clean compile`).
# Where to Go from Here
-* [Spark Programming Guide]({{HOME_PATH}}scala-programming-guide.html): how to get started using Spark, and details on the API
+Programming guides:
+* [Spark Programming Guide]({{HOME_PATH}}scala-programming-guide.html): how to get started using Spark, and details on the Scala API
+* [Java Programming Guide]({{HOME_PATH}}java-programming-guide.html): using Spark from Java
+
+Deployment guides:
* [Running Spark on Amazon EC2]({{HOME_PATH}}ec2-scripts.html): scripts that let you launch a cluster on EC2 in about 5 minutes
-* [Running Spark on Mesos]({{HOME_PATH}}running-on-mesos.html): instructions on how to deploy to a private cluster
-* [Running Spark on YARN]({{HOME_PATH}}running-on-yarn.html): instructions on how to run Spark on top of a YARN cluster
-* [Spark Standalone Mode]({{HOME_PATH}}spark-standalone.html): instructions on running Spark without Mesos
-* [Configuration]({{HOME_PATH}}configuration.html): How to set up and customize Spark via its configuration system.
-* [Bagel Programming Guide]({{HOME_PATH}}bagel-programming-guide.html): implementation of Google's Pregel on Spark
-* [Spark Debugger]({{HOME_PATH}}spark-debugger.html): experimental work on a debugger for Spark jobs
+* [Standalone Deploy Mode]({{HOME_PATH}}spark-standalone.html): launch a standalone cluster quickly without Mesos
+* [Running Spark on Mesos]({{HOME_PATH}}running-on-mesos.html): deploy a private cluster using
+ [Apache Mesos](http://incubator.apache.org/mesos)
+* [Running Spark on YARN]({{HOME_PATH}}running-on-yarn.html): deploy Spark on top of Hadoop NextGen (YARN)
+
+Miscellaneous:
+* [Configuration]({{HOME_PATH}}configuration.html): customize Spark via its configuration system.
+* [Bagel]({{HOME_PATH}}bagel-programming-guide.html): an implementation of Google's Pregel on Spark
* [Contributing to Spark](contributing-to-spark.html)
# Other Resources
* [Spark Homepage](http://www.spark-project.org)
-* [AMP Camp](http://ampcamp.berkeley.edu/) - In 2012, the AMP Lab hosted the first AMP Camp which featured talks and hands-on exercises about Spark, Shark, Mesos, and more. [Videos, slides](http://ampcamp.berkeley.edu/agenda) and the [exercises](http://ampcamp.berkeley.edu/exercises) are all available online now. Going through the videos and exercises is a great way to sharpen your Spark skills.
+* [AMP Camp](http://ampcamp.berkeley.edu/) - In 2012, the AMP Lab hosted the first AMP Camp which featured talks and hands-on exercises about Spark, Shark, Mesos, and more. [Videos, slides](http://ampcamp.berkeley.edu/agenda) and the [exercises](http://ampcamp.berkeley.edu/exercises) are all available online, and provide a great introduction to Spark.
* [Paper describing the programming model](http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf)
-* [Code Examples](http://spark-project.org/examples.html) (more also available in the [examples subfolder](https://github.com/mesos/spark/tree/master/examples/src/main/scala/spark/examples) of the Spark codebase)
-* [Mailing List](http://groups.google.com/group/spark-users)
+* [Code Examples](http://spark-project.org/examples.html): more are also available in the [examples subfolder](https://github.com/mesos/spark/tree/master/examples/src/main/scala/spark/examples) of Spark
+* [Mailing List](http://groups.google.com/group/spark-users): ask here for help
# Community