aboutsummaryrefslogtreecommitdiff
path: root/docs/index.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/index.md')
-rw-r--r--docs/index.md96
1 files changed, 96 insertions, 0 deletions
diff --git a/docs/index.md b/docs/index.md
new file mode 100644
index 0000000000..ed9953a590
--- /dev/null
+++ b/docs/index.md
@@ -0,0 +1,96 @@
+---
+layout: global
+title: Spark Overview
+---
+
+{% comment %}
+TODO(andyk): Rewrite to make the Java API a first class part of the story.
+{% endcomment %}
+
+Spark is a MapReduce-like cluster computing framework designed for low-latency iterative jobs and interactive use from an
+interpreter. It provides clean, language-integrated APIs in Scala and Java, with a rich array of parallel operators. Spark can
+run on top of the [Apache Mesos](http://incubator.apache.org/mesos/) cluster manager,
+[Hadoop YARN](http://hadoop.apache.org/docs/r2.0.1-alpha/hadoop-yarn/hadoop-yarn-site/YARN.html),
+Amazon EC2, or without an independent resource manager ("standalone mode").
+
+# Downloading
+
+Get Spark by visiting the [downloads page](http://spark-project.org/downloads.html) of the Spark website. This documentation is for Spark version {{site.SPARK_VERSION}}.
+
+# Building
+
+Spark requires [Scala {{site.SCALA_VERSION}}](http://www.scala-lang.org/). You will need to have Scala's `bin` directory in your `PATH`,
+or you will need to set the `SCALA_HOME` environment variable to point
+to where you've installed Scala. Scala must also be accessible through one
+of these methods on slave nodes on your cluster.
+
+Spark uses [Simple Build Tool](https://github.com/harrah/xsbt/wiki), which is bundled with it. To compile the code, go into the top-level Spark directory and run
+
+ sbt/sbt package
+
+# Testing the Build
+
+Spark comes with a number of sample programs in the `examples` directory.
+To run one of the samples, use `./run <class> <params>` in the top-level Spark directory
+(the `run` script sets up the appropriate paths and launches that program).
+For example, `./run spark.examples.SparkPi` will run a sample program that estimates Pi. Each of the
+examples prints usage help if no params are given.
+
+Note that all of the sample programs take a `<master>` parameter specifying the cluster URL
+to connect to. This can be a [URL for a distributed cluster](scala-programming-guide.html#master-urls),
+or `local` to run locally with one thread, or `local[N]` to run locally with N threads. You should start by using
+`local` for testing.
+
+Finally, Spark can be used interactively from a modified version of the Scala interpreter that you can start through
+`./spark-shell`. This is a great way to learn Spark.
+
+# A Note About Hadoop Versions
+
+Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported
+storage systems. Because the HDFS protocol has changed in different versions of
+Hadoop, you must build Spark against the same version that your cluster runs.
+You can change the version by setting the `HADOOP_VERSION` variable at the top
+of `project/SparkBuild.scala`, then rebuilding Spark (`sbt/sbt clean compile`).
+
+# Where to Go from Here
+
+**Programming guides:**
+
+* [Quick Start](quick-start.html): a quick introduction to the Spark API; start here!
+* [Spark Programming Guide](scala-programming-guide.html): an overview of Spark concepts, and details on the Scala API
+* [Java Programming Guide](java-programming-guide.html): using Spark from Java
+
+**Deployment guides:**
+
+* [Running Spark on Amazon EC2](ec2-scripts.html): scripts that let you launch a cluster on EC2 in about 5 minutes
+* [Standalone Deploy Mode](spark-standalone.html): launch a standalone cluster quickly without a third-party cluster manager
+* [Running Spark on Mesos](running-on-mesos.html): deploy a private cluster using
+ [Apache Mesos](http://incubator.apache.org/mesos)
+* [Running Spark on YARN](running-on-yarn.html): deploy Spark on top of Hadoop NextGen (YARN)
+
+**Other documents:**
+
+* [Configuration](configuration.html): customize Spark via its configuration system
+* [Tuning Guide](tuning.html): best practices to optimize performance and memory use
+* [API Docs (Scaladoc)](api/core/index.html)
+* [Bagel](bagel-programming-guide.html): an implementation of Google's Pregel on Spark
+* [Contributing to Spark](contributing-to-spark.html)
+
+**External resources:**
+
+* [Spark Homepage](http://www.spark-project.org)
+* [Mailing List](http://groups.google.com/group/spark-users): ask questions about Spark here
+* [AMP Camp](http://ampcamp.berkeley.edu/): a two-day training camp at UC Berkeley that featured talks and exercises
+ about Spark, Shark, Mesos, and more. [Videos](http://ampcamp.berkeley.edu/agenda-2012),
+ [slides](http://ampcamp.berkeley.edu/agenda-2012) and [exercises](http://ampcamp.berkeley.edu/exercises-2012) are
+ available online for free.
+* [Code Examples](http://spark-project.org/examples.html): more are also available in the [examples subfolder](https://github.com/mesos/spark/tree/master/examples/src/main/scala/spark/examples) of Spark
+* [Paper Describing the Spark System](http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf)
+
+# Community
+
+To get help using Spark or keep up with Spark development, sign up for the [spark-users mailing list](http://groups.google.com/group/spark-users).
+
+If you're in the San Francisco Bay Area, there's a regular [Spark meetup](http://www.meetup.com/spark-users/) every few weeks. Come by to meet the developers and other users.
+
+Finally, if you'd like to contribute code to Spark, read [how to contribute](contributing-to-spark.html).