diff options
Diffstat (limited to 'docs/index.md')
-rw-r--r-- | docs/index.md | 16 |
1 files changed, 15 insertions, 1 deletions
diff --git a/docs/index.md b/docs/index.md index 3cf9cc1c64..7d73929940 100644 --- a/docs/index.md +++ b/docs/index.md @@ -5,12 +5,14 @@ title: Spark Overview Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in [Scala](scala-programming-guide.html), [Java](java-programming-guide.html), and [Python](python-programming-guide.html) that make parallel jobs easy to write, and an optimized engine that supports general computation graphs. -Spark can run on the Apache Mesos cluster manager, Hadoop YARN, Amazon EC2, or without an independent resource manager ("standalone mode"). +It also supports a rich set of higher-level tools including [Shark](http://shark.cs.berkeley.edu) (Hive on Spark), [MLlib](mllib-guide.html) for machine learning, [Bagel](bagel-programming-guide.html) for graph processing, and [Spark Streaming](streaming-programming-guide.html). # Downloading Get Spark by visiting the [downloads page](http://spark.incubator.apache.org/downloads.html) of the Apache Spark site. This documentation is for Spark version {{site.SPARK_VERSION}}. +Spark runs on both Windows and UNIX-like systems (e.g. Linux, Mac OS). All you need to run it is to have `java` to installed on your system `PATH`, or the `JAVA_HOME` environment variable pointing to a Java installation. + # Building Spark uses [Simple Build Tool](http://www.scala-sbt.org), which is bundled with it. To compile the code, go into the top-level Spark directory and run @@ -35,6 +37,15 @@ or `local` to run locally with one thread, or `local[N]` to run locally with N t Finally, Spark can be used interactively through modified versions of the Scala shell (`./spark-shell`) or Python interpreter (`./pyspark`). These are a great way to learn Spark. +# Running on a Cluster + +Spark supports several options for deployment: + +* [Amazon EC2](ec2-scripts.html): our scripts let you launch a cluster in about 5 minutes +* [Standalone Deploy Mode](spark-standalone.html): simplest way to deploy Spark on a private cluster +* [Apache Mesos](running-on-mesos.html) +* [Hadoop YARN](running-on-yarn.html) + # A Note About Hadoop Versions Spark uses the Hadoop-client library to talk to HDFS and other Hadoop-supported @@ -50,6 +61,8 @@ In addition, if you wish to run Spark on [YARN](running-on-yarn.md), set SPARK_HADOOP_VERSION=2.0.5-alpha SPARK_YARN=true sbt/sbt assembly +(Note that on Windows, you need to set the environment variables on separate lines, e.g., `set SPARK_HADOOP_VERSION=1.2.1`.) + # Where to Go from Here **Programming guides:** @@ -90,6 +103,7 @@ In addition, if you wish to run Spark on [YARN](running-on-yarn.md), set **External resources:** * [Spark Homepage](http://spark.incubator.apache.org) +* [Shark](http://shark.cs.berkeley.edu): Apache Hive over Spark * [Mailing Lists](http://spark.incubator.apache.org/mailing-lists.html): ask questions about Spark here * [AMP Camps](http://ampcamp.berkeley.edu/): a series of training camps at UC Berkeley that featured talks and exercises about Spark, Shark, Mesos, and more. [Videos](http://ampcamp.berkeley.edu/agenda-2012), |