diff options
author | Matei Zaharia <matei@eecs.berkeley.edu> | 2012-09-25 15:46:18 -0700 |
---|---|---|
committer | Matei Zaharia <matei@eecs.berkeley.edu> | 2012-09-25 15:46:18 -0700 |
commit | e47e11720f7f51d720f98ad7dce959ac3ceef731 (patch) | |
tree | b93ebc673b2f0ff2da61f1cc38b4925cfe76310e /docs/index.md | |
parent | 30362a21e7e181c058a4f5979bc85f3b18fb5e70 (diff) | |
download | spark-e47e11720f7f51d720f98ad7dce959ac3ceef731.tar.gz spark-e47e11720f7f51d720f98ad7dce959ac3ceef731.tar.bz2 spark-e47e11720f7f51d720f98ad7dce959ac3ceef731.zip |
Documentation updates
Diffstat (limited to 'docs/index.md')
-rw-r--r-- | docs/index.md | 41 |
1 files changed, 21 insertions, 20 deletions
diff --git a/docs/index.md b/docs/index.md index 69d55e505e..b9d9a8a36f 100644 --- a/docs/index.md +++ b/docs/index.md @@ -7,7 +7,7 @@ title: Spark Overview TODO(andyk): Rewrite to make the Java API a first class part of the story. {% endcomment %} -Spark is a MapReduce-like cluster computing framework designed to support low-latency iterative jobs and interactive use from an interpreter. It is written in [Scala](http://www.scala-lang.org), a high-level language for the JVM, and exposes a clean language-integrated syntax that makes it easy to write parallel jobs. Spark runs on top of the [Apache Mesos](http://incubator.apache.org/mesos/) cluster manager, Hadoop YARN, or without an independent resource manager (i.e., in "standalone mode"). +Spark is a MapReduce-like cluster computing framework designed to support low-latency iterative jobs and interactive use from an interpreter. It exposes clean language-integrated APIs in [Scala](http://www.scala-lang.org) and Java, providing a wide array of parallel operations. Spark can run on top of the [Apache Mesos](http://incubator.apache.org/mesos/) cluster manager, Hadoop YARN, or without an independent resource manager ("standalone mode"). # Downloading @@ -15,19 +15,14 @@ Get Spark by checking out the master branch of the Git repository, using `git cl # Building -Spark requires [Scala 2.9](http://www.scala-lang.org/). -In addition, to run Spark on a cluster, you will need to install [Mesos](http://incubator.apache.org/mesos/), using the steps in -[Running Spark on Mesos]({{HOME_PATH}}running-on-mesos.html). However, if you just want to run Spark on a single machine (possibly using multiple cores), -you do not need Mesos. - -To build and run Spark, you will need to have Scala's `bin` directory in your `PATH`, +Spark requires [Scala 2.9.2](http://www.scala-lang.org/). You will need to have Scala's `bin` directory in your `PATH`, or you will need to set the `SCALA_HOME` environment variable to point to where you've installed Scala. Scala must be accessible through one -of these methods on Mesos slave nodes as well as on the master. +of these methods on slave nodes on your cluster. Spark uses [Simple Build Tool](https://github.com/harrah/xsbt/wiki), which is bundled with it. To compile the code, go into the top-level Spark directory and run - sbt/sbt compile + sbt/sbt package # Testing the Build @@ -44,7 +39,7 @@ thread, or `local[N]` to run locally with N threads. You should start by using ` Finally, Spark can be used interactively from a modified version of the Scala interpreter that you can start through `./spark-shell`. This is a great way to learn Spark. -# A Note About Hadoop +# A Note About Hadoop Versions Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported storage systems. Because the HDFS protocol has changed in different versions of @@ -54,23 +49,29 @@ of `project/SparkBuild.scala`, then rebuilding Spark (`sbt/sbt clean compile`). # Where to Go from Here -* [Spark Programming Guide]({{HOME_PATH}}scala-programming-guide.html): how to get started using Spark, and details on the API +Programming guides: +* [Spark Programming Guide]({{HOME_PATH}}scala-programming-guide.html): how to get started using Spark, and details on the Scala API +* [Java Programming Guide]({{HOME_PATH}}java-programming-guide.html): using Spark from Java + +Deployment guides: * [Running Spark on Amazon EC2]({{HOME_PATH}}ec2-scripts.html): scripts that let you launch a cluster on EC2 in about 5 minutes -* [Running Spark on Mesos]({{HOME_PATH}}running-on-mesos.html): instructions on how to deploy to a private cluster -* [Running Spark on YARN]({{HOME_PATH}}running-on-yarn.html): instructions on how to run Spark on top of a YARN cluster -* [Spark Standalone Mode]({{HOME_PATH}}spark-standalone.html): instructions on running Spark without Mesos -* [Configuration]({{HOME_PATH}}configuration.html): How to set up and customize Spark via its configuration system. -* [Bagel Programming Guide]({{HOME_PATH}}bagel-programming-guide.html): implementation of Google's Pregel on Spark -* [Spark Debugger]({{HOME_PATH}}spark-debugger.html): experimental work on a debugger for Spark jobs +* [Standalone Deploy Mode]({{HOME_PATH}}spark-standalone.html): launch a standalone cluster quickly without Mesos +* [Running Spark on Mesos]({{HOME_PATH}}running-on-mesos.html): deploy a private cluster using + [Apache Mesos](http://incubator.apache.org/mesos) +* [Running Spark on YARN]({{HOME_PATH}}running-on-yarn.html): deploy Spark on top of Hadoop NextGen (YARN) + +Miscellaneous: +* [Configuration]({{HOME_PATH}}configuration.html): customize Spark via its configuration system. +* [Bagel]({{HOME_PATH}}bagel-programming-guide.html): an implementation of Google's Pregel on Spark * [Contributing to Spark](contributing-to-spark.html) # Other Resources * [Spark Homepage](http://www.spark-project.org) -* [AMP Camp](http://ampcamp.berkeley.edu/) - In 2012, the AMP Lab hosted the first AMP Camp which featured talks and hands-on exercises about Spark, Shark, Mesos, and more. [Videos, slides](http://ampcamp.berkeley.edu/agenda) and the [exercises](http://ampcamp.berkeley.edu/exercises) are all available online now. Going through the videos and exercises is a great way to sharpen your Spark skills. +* [AMP Camp](http://ampcamp.berkeley.edu/) - In 2012, the AMP Lab hosted the first AMP Camp which featured talks and hands-on exercises about Spark, Shark, Mesos, and more. [Videos, slides](http://ampcamp.berkeley.edu/agenda) and the [exercises](http://ampcamp.berkeley.edu/exercises) are all available online, and provide a great introduction to Spark. * [Paper describing the programming model](http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf) -* [Code Examples](http://spark-project.org/examples.html) (more also available in the [examples subfolder](https://github.com/mesos/spark/tree/master/examples/src/main/scala/spark/examples) of the Spark codebase) -* [Mailing List](http://groups.google.com/group/spark-users) +* [Code Examples](http://spark-project.org/examples.html): more are also available in the [examples subfolder](https://github.com/mesos/spark/tree/master/examples/src/main/scala/spark/examples) of Spark +* [Mailing List](http://groups.google.com/group/spark-users): ask here for help # Community |