Lightning-fast cluster computing

Spark Documentation

Setup instructions, programming guides, and other documentation are available for each version of Spark below:

Read these documents to get started with Spark, as well as with the built-in components (MLlib and Spark Streaming).

In addition, this page lists some external resources for learning Spark.

Video Tutorials

Hands-On Exercises

  • Hands-on exercises are available online from Spark Summit 2013. These exercises let you launch a small EC2 cluster, load a dataset, and query it with Spark, Shark, Spark Streaming, and MLLib.

Training Materials

External Tutorials, Blog Posts, and Talks

Books

Examples

Wiki

  • The Spark wiki contains information for developers, such as architecture documents and how to contribute to Spark.

Research Papers

Spark was initially developed as a UC Berkeley research project, and much of the design is documented in papers. The research page lists some of the original motivation and direction. The following papers have been published about Spark and related projects.