<!DOCTYPE html>
<!--[if IE 6]>
<html id="ie6" dir="ltr" lang="en-US">
<![endif]-->
<!--[if IE 7]>
<html id="ie7" dir="ltr" lang="en-US">
<![endif]-->
<!--[if IE 8]>
<html id="ie8" dir="ltr" lang="en-US">
<![endif]-->
<!--[if !(IE 6) | !(IE 7) | !(IE 8) ]><!-->
<html dir="ltr" lang="en-US">
<!--<![endif]-->
<head>
<link rel="shortcut icon" href="/favicon.ico" />
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width" />
<title>
Documentation | Apache Spark
</title>
<link rel="stylesheet" type="text/css" media="all" href="/css/style.css" />
<link rel="stylesheet" href="/css/pygments-default.css">
<script type="text/javascript">
<!-- Google Analytics initialization -->
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-32518208-2']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
<!-- Adds slight delay to links to allow async reporting -->
function trackOutboundLink(link, category, action) {
try {
_gaq.push(['_trackEvent', category , action]);
} catch(err){}
setTimeout(function() {
document.location.href = link.href;
}, 100);
}
</script>
<link rel='canonical' href='/index.html' />
<style type="text/css">
#site-title,
#site-description {
position: absolute !important;
clip: rect(1px 1px 1px 1px); /* IE6, IE7 */
clip: rect(1px, 1px, 1px, 1px);
}
</style>
<style type="text/css" id="custom-background-css">
body.custom-background { background-color: #f1f1f1; }
</style>
</head>
<!--body class="page singular"-->
<body class="page singular">
<div id="page" class="hfeed">
<header id="branding" role="banner">
<hgroup>
<h1 id="site-title"><span><a href="/" title="Spark" rel="home">Spark</a></span></h1>
<h2 id="site-description">Lightning-Fast Cluster Computing</h2>
</hgroup>
<a id="main-logo" href="/">
<img style="height:175px; width:auto;" src="/images/spark-project-header1-cropped.png" alt="Spark: Lightning-Fast Cluster Computing" title="Spark: Lightning-Fast Cluster Computing" />
</a>
<div class="widget-summit">
<a href="http://spark-summit.org"><img src="/images/Summit-Logo-FINALtr-150x150px.png" /></a>
<div class="text">
<a href="http://spark-summit.org/2013">
<strong>Videos and Slides<br/>
Available Now!</strong>
</a>
</div>
</div>
<nav id="access" role="navigation">
<h3 class="assistive-text">Main menu</h3>
<div class="menu-main-menu-container">
<ul id="menu-main-menu" class="menu">
<li class="menu-item menu-item-type-post_type menu-item-object-page ">
<a href="/index.html">Home</a>
</li>
<li class="menu-item menu-item-type-post_type menu-item-object-page ">
<a href="/downloads.html">Downloads</a>
</li>
<li class="menu-item menu-item-type-post_type menu-item-object-page current-menu-item">
<a href="/documentation.html">Documentation</a>
</li>
<li class="menu-item menu-item-type-post_type menu-item-object-page ">
<a href="/examples.html">Examples</a>
</li>
<li class="menu-item menu-item-type-post_type menu-item-object-page ">
<a href="/mailing-lists.html">Mailing Lists</a>
</li>
<li class="menu-item menu-item-type-post_type menu-item-object-page ">
<a href="/research.html">Research</a>
</li>
<li class="menu-item menu-item-type-post_type menu-item-object-page ">
<a href="/faq.html">FAQ</a>
</li>
</ul></div>
</nav><!-- #access -->
</header><!-- #branding -->
<div id="main">
<div id="primary">
<div id="content" role="main">
<article class="page type-page status-publish hentry">
<h2>Spark Documentation</h2>
<p>Setup instructions, programming guides, and other documentation are available for each version of Spark below:</p>
<ul>
<li><a href="/docs/latest/">Spark 0.8.0 (latest release)</a></li>
<li><a href="/docs/0.7.3/">Spark 0.7.3</a></li>
<li><a href="/docs/0.6.2/">Spark 0.6.2</a></li>
<li><a href="https://github.com/mesos/spark/wiki/Spark-0.5-Documentation">Spark 0.5.x</a> (hosted on GitHub)</li>
</ul>
<p>Read these documents to get started with Spark. In addition, this page lists some external resources for learning Spark.</p>
<h3>Video Tutorials</h3>
<ul>
<li><a href="/screencasts/1-first-steps-with-spark.html">Screencast 1: First Steps with Spark</a></li>
<li><a href="/screencasts/2-spark-documentation-overview.html">Screencast 2: Spark Documentation Overview</a></li>
<li><a href="/screencasts/3-transformations-and-caching.html">Screencast 3: Transformations and Caching</a></li>
<li><a href="/screencasts/4-a-standalone-job-in-spark.html">Screencast 4: A Spark Standalone Job in Scala</a></li>
</ul>
<h3>Hands-On Exercises</h3>
<ul>
<li><a href="http://ampcamp.berkeley.edu/3/exercises/">Hands-on exercises</a> are available online. These exercises let you launch a small EC2 cluster, load a dataset, and query it with Spark, Shark, Spark Streaming, and MLLib.</li>
</ul>
<h3>Spark Summit Slides and Videos</h3>
<ul>
<li><a href="http://spark-summit.org/2013">Spark Summit 2013</a> was held in downtown San Francisco in December 2013. Slides and Videos of all talks are available for free. Look for links next to talk titles on the event agenda.</li>
</ul>
<h3>AMP Camp Slides and Videos</h3>
<ul>
<li>The <a href="https://amplab.cs.berkeley.edu/">UC Berkeley AMPLab</a> regularly hosts two-day training camps on Spark and related "big data" components.
Slides and videos from each camp are posted online:
<br /><a href="http://ampcamp.berkeley.edu/3/">AMP Camp Three</a> <em>Big Data Bootcamp Berkeley</em> (August 2013)
<br /><a href="http://ampcamp.berkeley.edu/amp-camp-two-strata-2013/">AMP Camp Two</a> <em>Big Data Bootcamp Strata</em> (February 2013)
<br /><a href="http://ampcamp.berkeley.edu/agenda-2012/">AMP Camp One</a> <em>Big Data Bootcamp Berkeley</em> (August 2012)
</li>
</ul>
<h3>Books</h3>
<ul>
<li><a href="http://www.packtpub.com/fast-data-processing-with-spark/book">Fast Data Processing with Spark</a>, by Holden Karau (Packt Publishing)</li>
</ul>
<h3>External Tutorials, Development Blogs, and Talks</h3>
<ul>
<li><a href="http://www.pwendell.com/2013/09/28/declarative-streams.html">Sampling Twitter Using Declarative Streams</a> -- Spark Streaming tutorial by Patrick Wendell</li>
<li><a href="http://zenfractal.com/2013/08/21/a-powerful-big-data-trio/">A Powerful Big Data Trio: Spark, Parquet and Avro</a> -- Using Parquet in Spark by Matt Massie</li>
<li><a href="http://www.slideshare.net/EvanChan2/cassandra2013-spark-talk-final">Real-time Analytics with Cassandra, Spark, and Shark</a> -- Presentation by Evan Chan from Ooyala at the 2013 Cassandra Summit</li>
<li><a href="http://syndeticlogic.net/?p=311">Getting Spark Setup in Eclipse</a> -- Developer blog post by James Percent</li>
<li><a href="http://aws.amazon.com/articles/Elastic-MapReduce/4926593393724923">Run Spark and Shark on Amazon Elastic MapReduce</a> -- Article by Amazon AWS Elastic MapReduce team member Parviz Deyhim</li>
<li><a href="http://blog.quantifind.com/posts/spark-unit-test/">Unit testing with Spark</a> -- Quantifind tech blog post by Imran Rashid</li>
<li><a href="http://blog.quantifind.com/posts/logging-post/">Configuring Spark logs</a> -- Quantifind tech blog by Imran Rashid</li>
<li><a href="http://www.ibm.com/developerworks/library/os-spark/">Spark, an alternative for fast data analytics</a> -- IBM Developer Works article by M. Tim Jones</li>
</ul>
<h3>Spark Internals</h3>
<ul>
<li><a href="http://www.youtube.com/watch?v=49Hr5xZyTEA">Overview of Spark Internals [advanced]</a> (<a href="/talks/dev-meetup-dec-2012.pptx">pptx</a>) (<a href="http://www.youtube.com/watch?v=49Hr5xZyTEA">video</a>)</li>
</ul>
<h3>Research Papers</h3>
<ul>
<li>
<a href="http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-214.pdf">Shark: SQL and Rich Analytics at Scale</a>. Reynold Xin, Joshua Rosen, Matei Zaharia, Michael J. Franklin, Scott Shenker, Ion Stoica. <em>Technical Report UCB/EECS-2012-214</em>. November 2012.
</li>
<li>
<a href="http://www.cs.berkeley.edu/~matei/papers/2012/hotcloud_spark_streaming.pdf">Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters</a>. Matei Zaharia, Tathagata Das, Haoyuan Li, Scott Shenker, Ion Stoica. <em>HotCloud 2012</em>. June 2012.
</li>
<li>
<a href="http://www.cs.berkeley.edu/~matei/papers/2012/sigmod_shark_demo.pdf">Shark: Fast Data Analysis Using Coarse-grained Distributed Memory</a> (demo). Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Haoyuan Li, Scott Shenker, Ion Stoica. <em>SIGMOD 2012</em>. May 2012. <b>Best Demo Award</b>.
</li>
<li>
<a href="http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf">Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing</a>. Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, Ion Stoica. <em>NSDI 2012</em>. April 2012. <b>Best Paper Award</b> and <b>Honorable Mention for Community Award</b>.
</li>
<li>
<a href="http://www.cs.berkeley.edu/~matei/papers/2011/tr_spark.pdf">Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing</a>. Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, Ion Stoica. <em>Technical Report UCB/EECS-2011-82</em>. July 2011.</li>
<li>
<a href="http://www.cs.berkeley.edu/~matei/papers/2010/hotcloud_spark.pdf">Spark: Cluster Computing with Working Sets</a>. Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, Ion Stoica. <em>HotCloud 2010</em>. June 2010.
</li>
</ul>
</article><!-- #post -->
</div><!-- #content -->
<footer id="colophon" role="contentinfo">
<div id="site-generator">
<p style="padding-top: 0; padding-bottom: 15px;">
Apache Spark is an effort undergoing incubation at The Apache Software Foundation.
<a href="http://incubator.apache.org/" style="border: none;">
<img style="vertical-align: middle; border: none;" src="/images/incubator-logo.png" alt="Apache Incubator" title="Apache Incubator" />
</a>
</p>
</div>
</footer><!-- #colophon -->
</div><!-- #primary -->
</div><!-- #main -->
</div><!-- #page -->
</body>
</html>