aboutsummaryrefslogtreecommitdiff
path: root/docs/index.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/index.md')
-rw-r--r--docs/index.md16
1 files changed, 15 insertions, 1 deletions
diff --git a/docs/index.md b/docs/index.md
index 3cf9cc1c64..7d73929940 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -5,12 +5,14 @@ title: Spark Overview
Apache Spark is a fast and general-purpose cluster computing system.
It provides high-level APIs in [Scala](scala-programming-guide.html), [Java](java-programming-guide.html), and [Python](python-programming-guide.html) that make parallel jobs easy to write, and an optimized engine that supports general computation graphs.
-Spark can run on the Apache Mesos cluster manager, Hadoop YARN, Amazon EC2, or without an independent resource manager ("standalone mode").
+It also supports a rich set of higher-level tools including [Shark](http://shark.cs.berkeley.edu) (Hive on Spark), [MLlib](mllib-guide.html) for machine learning, [Bagel](bagel-programming-guide.html) for graph processing, and [Spark Streaming](streaming-programming-guide.html).
# Downloading
Get Spark by visiting the [downloads page](http://spark.incubator.apache.org/downloads.html) of the Apache Spark site. This documentation is for Spark version {{site.SPARK_VERSION}}.
+Spark runs on both Windows and UNIX-like systems (e.g. Linux, Mac OS). All you need to run it is to have `java` to installed on your system `PATH`, or the `JAVA_HOME` environment variable pointing to a Java installation.
+
# Building
Spark uses [Simple Build Tool](http://www.scala-sbt.org), which is bundled with it. To compile the code, go into the top-level Spark directory and run
@@ -35,6 +37,15 @@ or `local` to run locally with one thread, or `local[N]` to run locally with N t
Finally, Spark can be used interactively through modified versions of the Scala shell (`./spark-shell`) or
Python interpreter (`./pyspark`). These are a great way to learn Spark.
+# Running on a Cluster
+
+Spark supports several options for deployment:
+
+* [Amazon EC2](ec2-scripts.html): our scripts let you launch a cluster in about 5 minutes
+* [Standalone Deploy Mode](spark-standalone.html): simplest way to deploy Spark on a private cluster
+* [Apache Mesos](running-on-mesos.html)
+* [Hadoop YARN](running-on-yarn.html)
+
# A Note About Hadoop Versions
Spark uses the Hadoop-client library to talk to HDFS and other Hadoop-supported
@@ -50,6 +61,8 @@ In addition, if you wish to run Spark on [YARN](running-on-yarn.md), set
SPARK_HADOOP_VERSION=2.0.5-alpha SPARK_YARN=true sbt/sbt assembly
+(Note that on Windows, you need to set the environment variables on separate lines, e.g., `set SPARK_HADOOP_VERSION=1.2.1`.)
+
# Where to Go from Here
**Programming guides:**
@@ -90,6 +103,7 @@ In addition, if you wish to run Spark on [YARN](running-on-yarn.md), set
**External resources:**
* [Spark Homepage](http://spark.incubator.apache.org)
+* [Shark](http://shark.cs.berkeley.edu): Apache Hive over Spark
* [Mailing Lists](http://spark.incubator.apache.org/mailing-lists.html): ask questions about Spark here
* [AMP Camps](http://ampcamp.berkeley.edu/): a series of training camps at UC Berkeley that featured talks and
exercises about Spark, Shark, Mesos, and more. [Videos](http://ampcamp.berkeley.edu/agenda-2012),