From 66fb4b11cbae79d9044b2bbf2e53351642a58ff5 Mon Sep 17 00:00:00 2001 From: Patrick Wendell Date: Fri, 30 May 2014 08:55:36 +0000 Subject: Docs for Spark 1.0.0 --- site/docs/1.0.0/index.html | 283 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 283 insertions(+) create mode 100644 site/docs/1.0.0/index.html (limited to 'site/docs/1.0.0/index.html') diff --git a/site/docs/1.0.0/index.html b/site/docs/1.0.0/index.html new file mode 100644 index 000000000..00d836871 --- /dev/null +++ b/site/docs/1.0.0/index.html @@ -0,0 +1,283 @@ + + + + + + + + + Spark Overview - Spark 1.0.0 Documentation + + + + + + + + + + + + + + + + + + + + + + + + +
+ +

Spark Overview

+ + +

Apache Spark is a fast and general-purpose cluster computing system. +It provides high-level APIs in Java, Scala and Python, +and an optimized engine that supports general execution graphs. +It also supports a rich set of higher-level tools including Shark (Hive on Spark), Spark SQL for structured data, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.

+ +

Downloading

+ +

Get Spark from the downloads page of the project website. This documentation is for Spark version 1.0.0. The downloads page +contains Spark packages for many popular HDFS versions. If you’d like to build Spark from +scratch, visit building Spark with Maven.

+ +

Spark runs on both Windows and UNIX-like systems (e.g. Linux, Mac OS). It’s easy to run +locally on one machine — all you need is to have java installed on your system PATH, +or the JAVA_HOME environment variable pointing to a Java installation.

+ +

Spark runs on Java 6+ and Python 2.6+. For the Scala API, Spark 1.0.0 uses +Scala 2.10. You will need to use a compatible Scala version +(2.10.x).

+ +

Running the Examples and Shell

+ +

Spark comes with several sample programs. Scala, Java and Python examples are in the +examples/src/main directory. To run one of the Java or Scala sample programs, use +bin/run-example <class> [params] in the top-level Spark directory. (Behind the scenes, this +invokes the more general +spark-submit script for +launching applications). For example,

+ +
./bin/run-example SparkPi 10
+
+ +

You can also run Spark interactively through a modified version of the Scala shell. This is a +great way to learn the framework.

+ +
./bin/spark-shell --master local[2]
+
+ +

The --master option specifies the +master URL for a distributed cluster, or local to run +locally with one thread, or local[N] to run locally with N threads. You should start by using +local for testing. For a full list of options, run Spark shell with the --help option.

+ +

Spark also provides a Python API. To run Spark interactively in a Python interpreter, use +bin/pyspark:

+ +
./bin/pyspark --master local[2]
+
+ +

Example applications are also provided in Python. For example,

+ +
./bin/spark-submit examples/src/main/python/pi.py 10
+
+ +

Launching on a Cluster

+ +

The Spark cluster mode overview explains the key concepts in running on a cluster. +Spark can run both by itself, or over several existing cluster managers. It currently provides several +options for deployment:

+ + + +

Where to Go from Here

+ +

Programming Guides:

+ + + +

API Docs:

+ + + +

Deployment Guides:

+ + + +

Other Documents:

+ + + +

External Resources:

+ + + +

Community

+ +

To get help using Spark or keep up with Spark development, sign up for the user mailing list.

+ +

If you’re in the San Francisco Bay Area, there’s a regular Spark meetup every few weeks. Come by to meet the developers and other users.

+ +

Finally, if you’d like to contribute code to Spark, read how to contribute.

+ + +
+ + + + + + + + + + + -- cgit v1.2.3