Adding 0.8.0 release.

author: Patrick Wendell <pwendell@apache.org> 2013-09-25 21:12:46 +0000
committer: Patrick Wendell <pwendell@apache.org> 2013-09-25 21:12:46 +0000
commit: a5051c33c8728dcf2b84ae781b8d9bc05fb0e528 (patch)
tree: b4f19507a7aec2315548756b7ef49f9887eab00d
parent: 43949b33e0f1ae3f52c878852cc81505276be00b (diff)
download: spark-website-a5051c33c8728dcf2b84ae781b8d9bc05fb0e528.tar.gz
spark-website-a5051c33c8728dcf2b84ae781b8d9bc05fb0e528.tar.bz2
spark-website-a5051c33c8728dcf2b84ae781b8d9bc05fb0e528.zip
8 files changed, 143 insertions, 9 deletions
diff --git a/documentation.md b/documentation.md
index 1e8e50285..172585602 100644
--- a/documentation.md
+++ b/documentation.md
@@ -12,7 +12,8 @@ navigation:
 <p>Setup instructions, programming guides, and other documentation are available for each version of Spark below:</p>
 
 <ul>
-  <li><a href="{{site.url}}docs/latest/">Spark 0.7.3 (latest release)</a></li>
+  <li><a href="{{site.url}}docs/latest/">Spark 0.8.0 (latest release)</a></li>
+  <li><a href="{{site.url}}docs/0.7.3/">Spark 0.7.3</a></li>
   <li><a href="{{site.url}}docs/0.6.2/">Spark 0.6.2</a></li>
   <li><a href="https://github.com/mesos/spark/wiki/Spark-0.5-Documentation">Spark 0.5.x</a> (hosted on GitHub)</li>
 </ul>
diff --git a/downloads.md b/downloads.md
index d90e01cc1..1ebbe633b 100644
--- a/downloads.md
+++ b/downloads.md
@@ -8,13 +8,14 @@ navigation:
 ---
 
 <h2>Download Spark</h2>
-The latest release of Spark is 0.7.3. You can either download it as a <a href="http://spark-project.org/download/spark-0.7.3-sources.tgz">source package</a> (4 MB tar.gz) or get prebuilt packages for <a href="http://spark-project.org/download/spark-0.7.3-prebuilt-hadoop1.tgz">Hadoop 1 / CDH3</a> or <a href="http://spark-project.org/download/spark-0.7.3-prebuilt-cdh4.tgz">CDH 4</a> (61 MB tar.gz).
+The latest release of Spark is 0.8.0. You can either download it as a <a href="http://spark-project.org/download/spark-0.8.0-incubating.tgz">source package</a> (4 MB tar.gz) or as a prebuilt package for <a href="http://spark-project.org/download/spark-0.8.0-incubating-bin-hadoop1.tgz">Hadoop 1 / CDH3</a> or <a href="http://spark-project.org/download/spark-0.8.0-incubating-bin-cdh4.tgz">CDH 4</a> (125 MB tar.gz).
 
-If you are interested in working with the newest under-development code or contributing to Spark development, you can also check out the master branch from Git: <tt>git clone git://github.com/mesos/spark.git</tt>.
+If you are interested in working with the newest under-development code or contributing to Spark development, you can also check out the master branch from Git: <tt>git clone git://github.com/apache/incubator-spark.git</tt>.
 
 Once you've downloaded Spark, you can find instructions for installing and building it on the <a href="{{site.url}}documentation.html">documentation page</a>.
 <h3>Previous Releases</h3>
 <ul>
+	<li><a href="http://spark-project.org/download/spark-0.8.0-incubating.tgz">Spark 0.8.0</a> (September 25, 2013) <a href="{{site.url}}releases/spark-release-0-8-0.html">(release notes)</a> (prebuilt: <a href="http://spark-project.org/download/spark-0.8.0-incubating-bin-hadoop1.tgz">Hadoop 1 / CDH3</a>, <a href="http://spark-project.org/download/spark-0.8.0-incubating-bin-cdh4.tgz">CDH 4</a>)</li>
 	<li><a href="http://spark-project.org/download/spark-0.7.3-sources.tgz">Spark 0.7.3</a> (July 16, 2013) <a href="{{site.url}}releases/spark-release-0-7-3.html">(release notes)</a> (prebuilt: <a href="http://spark-project.org/download/spark-0.7.3-prebuilt-hadoop1.tgz">Hadoop 1 / CDH3</a>, <a href="http://spark-project.org/download/spark-0.7.3-prebuilt-cdh4.tgz">CDH 4</a>)</li>
 	<li><a href="http://spark-project.org/download/spark-0.7.2-sources.tgz">Spark 0.7.2</a> (June 2, 2013) <a href="{{site.url}}releases/spark-release-0-7-2.html">(release notes)</a> (prebuilt: <a href="http://spark-project.org/download/spark-0.7.2-prebuilt-hadoop1.tgz">Hadoop 1 / CDH3</a>, <a href="http://spark-project.org/download/spark-0.7.2-prebuilt-cdh4.tgz">CDH 4</a>)</li>
 	<li><a href="http://spark-project.org/download/spark-0.7.0-sources.tgz">Spark 0.7.0</a> (February 27, 2013) <a href="{{site.url}}releases/spark-release-0-7-0.html">(release notes)</a></li>
diff --git a/examples.md b/examples.md
index 36e7b9658..36c8415e6 100644
--- a/examples.md
+++ b/examples.md
@@ -8,7 +8,7 @@ navigation:
 ---
 <h2>Spark Examples</h2>
 
-Spark is built around <em>distributed datasets</em> that support types of parallel operations: transformations, which are lazy and yield another distributed dataset (e.g., <code>map</code>, <code>filter</code>, and <code>join</code>), and actions, which force the computation of a dataset and return a result (e.g., <code>count</code>). The following examples show off some of the available operations and features.
+Spark is built around <em>distributed datasets</em> that support types of parallel operations: transformations, which are lazy and yield another distributed dataset (e.g., <code>map</code>, <code>filter</code>, and <code>join</code>), and actions, which force the computation of a dataset and return a result (e.g., <code>count</code>). The following examples show off some of the available operations and features. Several additional examples are distributed with Spark, both for core Spark ([Scala examples](https://github.com/apache/incubator-spark/tree/master/examples/src/main/scala/org/apache/spark/examples), [Java examples](https://github.com/apache/incubator-spark/tree/master/examples/src/main/java/org/apache/spark/examples), [Python examples](https://github.com/apache/incubator-spark/tree/master/python/examples)) and streaming Spark ([Scala examples](https://github.com/apache/incubator-spark/tree/master/examples/src/main/scala/org/apache/spark/streaming/examples), [Java examples](https://github.com/apache/incubator-spark/tree/master/examples/src/main/java/org/apache/spark/streaming/examples)).
 
 <h3>Text Search</h3>
 
diff --git a/releases/_posts/2013-09-25-spark-release-0-8-0.md b/releases/_posts/2013-09-25-spark-release-0-8-0.md
new file mode 100644
index 000000000..7008e4447
--- /dev/null
+++ b/releases/_posts/2013-09-25-spark-release-0-8-0.md
@@ -0,0 +1,130 @@
+---
+layout: post
+title: Spark Release 0.8.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+  _edit_last: '4'
+  _wpas_done_all: '1'
+---
+Spark 0.8.0 is a major release that includes many new capabilities and usability improvements. It’s also our first release under the Apache incubator. It is the largest Spark release yet, with contributions from 68 developers and 24 companies.
+
+You can download Spark 0.8.0 as either a <a href="http://spark-project.org/download/spark-0.8.0-incubating.tgz">source package</a> (4 MB tar.gz) or a prebuilt pacakge for <a href="http://spark-project.org/download/spark-0.8.0-incubating-bin-hadoop1.tgz">Hadoop 1 / CDH3</a> or <a href="http://spark-project.org/download/spark-0.8.0-incubating-bin-cdh4.tgz">CDH4</a> (125 MB tar.gz).
+
+### Monitoring UI and Metrics
+Spark now displays a variety of monitoring data in a web UI (by default at port 4040 on the driver node). A new job dashboard contains information about running, succeeded, and failed jobs, including percentile statistics covering task runtime, shuffled data, and garbage collection. The existing storage dashboard has been extended, and additional pages have been added to display total storage and task information per-executor. Finally, a new metrics library exposes internal Spark metrics through various API’s including JMX and Ganglia.
+
+<p style="text-align: center;">
+<img src="{{site.root}}/images/0.8.0-ui-screenshot.png" style="width:600px;">
+</p>
+
+### Machine Learning Library
+This release introduces MLlib, a standard library of high-quality machine learning and optimization algorithms for Spark. MLlib was developed in collaboration with the [U.C. Berkeley MLBase project](http://www.mlbase.org/). The current library contains seven algorithms, including support vector machines (SVMs), logistic regression, several regularized variants of linear regression, a clustering algorithm (KMeans), and alternating least squares collaborative filtering.
+
+### Python Improvements
+The Python API has been extended with many previously missing features. This includes support for different storage levels, sampling, and various missing RDD operators. We’ve also added support for running Spark in [IPython](http://ipython.org/), including the IPython Notebook, and for running PySpark on Windows.
+
+### Hadoop YARN support
+Spark 0.8 add greatly improved support for running standalone Spark jobs on a YARN cluster. The YARN support is no longer experimental but now part of mainline Spark. Support for running against a secured YARN cluster has also been added.
+
+### Revamped Job Scheduler
+Spark’s internal job scheduler has been refactored and extended to include more sophisticated scheduling policies. In particular, a [fair scheduler](http://spark.incubator.apache.org/docs/0.8.0/job-scheduling.html#scheduling-within-an-application) implementation now allows multiple users to share an instance of Spark, which helps users running shorter jobs to achieve good performance, even when longer-running jobs are running in parallel. Support for topology-aware scheduling has been extended, including the ability to take into account rack locality and support for multiple executors on a single machine.
+
+### Easier Deployment and Linking
+User programs can now link to Spark no matter which Hadoop version they need, without having to publish a version of `spark-core` specifically for that Hadoop version. An explanation of how to link against different Hadoop versions is provided [here](http://spark.incubator.apache.org/docs/0.8.0/scala-programming-guide.html#linking-with-spark). 
+
+### Expanded EC2 Capabilities
+Spark’s EC2 scripts now support launching in any availability zone. Support has also been added for EC2 instance types which use the newer “HVM” architecture. This includes the cluster compute (cc1/cc2) family of instance types. We’ve also added support for running newer versions of HDFS alongside Spark. Finally, we’ve added the ability to launch clusters with maintenance releases of Spark in addition to launching the newest release.
+
+### Improved Documentation
+This release adds documentation about cluster hardware provisioning and inter-operation with common Hadoop distributions. Docs are also included to cover the MLlib machine learning functions and new cluster monitoring features. Existing documentation has been updated to reflect changes in building and deploying Spark. 
+
+### Other Improvements
+* RDDs can now manually be dropped from memory with `unpersist`.
+* The RDD class includes the following new operations: `takeOrdered`, `zipPartitions`, `top`.
+* A `JobLogger` class has been added to produce archivable logs of a Spark workload.
+* The `RDD.coalesce` function now takes into account locality.
+* The `RDD.pipe` function has been extended to support passing environment variables to child processes.
+* Hadoop `save` functions now support an optional compression codec.
+* You can now create a binary distribution of Spark which depends only on a Java runtime for easier deployment on a cluster.
+* The examples build has been isolated from the core build, substantially reducing the potential for dependency conflicts.
+* The Spark Streaming Twitter API has been updated to use OAuth authentication instead of the deprecated username/password authentication in Spark 0.7.0.
+* Several new example jobs have been added, including PageRank implementations in Java, Scala and Python, examples for accessing HBase and Cassandra, and MLlib examples.
+* Support for running on Mesos has been improved -- now you can deploy a Spark assembly JAR as part of the Mesos job, instead of having Spark pre-installed on each machine. The default Mesos version has also been updated to 0.13.
+* This release includes various optimizations to PySpark and to the job scheduler.
+ 
+### Compatibility
+* <strong>This release changes Spark’s package name to 'org.apache.spark'</strong>, so those upgrading from Spark 0.7 will need to adjust their imports accordingly. In addition, we’ve moved the `RDD` class to the org.apache.spark.rdd package (it was previously in the top-level package). The Spark artifacts published through Maven have also changed to the new package name.
+* In the Java API, use of Scala’s `Option` class has been replaced with `Optional` from the Guava library.
+* Linking against Spark for arbitrary Hadoop versions is now possible by specifying a dependency on `hadoop-client`, instead of rebuilding `spark-core` against your version of Hadoop. See the documentation [here](http://spark.incubator.apache.org/docs/0.8.0/scala-programming-guide.html#linking-with-spark) for details.
+* If you are building Spark, you’ll now need to run `sbt/sbt assembly` instead of `package`.
+
+
+### Credits
+Spark 0.8.0 was the result of the largest team of contributors yet. The following developers contributed to this release:
+
+* Andrew Ash -- documentation, code cleanup and logging improvements
+* Mikhail Bautin -- bug fix
+* Konstantin Boudnik -- Maven build, bug fixes, and documentation
+* Ian Buss -- sbt configuration improvement
+* Evan Chan -- API improvement, bug fix, and documentation
+* Lian Cheng -- bug fix
+* Tathagata Das -- performance improvement in streaming receiver and streaming bug fix
+* Aaron Davidson -- Python improvements, bug fix, and unit tests
+* Joseph E. Gonzalez -- improvement to zipPartitions
+* Karen Feng -- several improvements to web UI
+* Andy Feng -- HDFS metrics
+* Ali Ghodsi -- configuration improvements and locality-aware coalesce
+* Thomas Graves -- support for secure YARN cluster and various YARN-related improvements
+* Stephen Haberman -- bug fix, documentation, and code cleanup
+* Mark Hamstra -- bug fixes and Maven build
+* Benjamin Hindman -- Mesos compatibility and documentation
+* Liang-Chi Hsieh -- bug fixes in build and in YARN mode
+* Shane Huang -- shuffle improvements, bug fix
+* Ethan Jewett -- Spark/HBase example
+* Holden Karau -- bug fix and EC2 improvement
+* Andy Konwinski -- documentation
+* Jey Kottalam -- PySpark optimizations, Hadoop agnostic build (lead), and bug fixes
+* S. Kumar -- Spark Streaming example
+* Ryan LeCompte -- topK method optimization and serialization improvements
+* Gavin Li -- compression codecs and pipe support
+* Harold Lim -- fair scheduler
+* Dmitriy Lyubimov -- bug fix
+* Chris Mattman -- Apache mentor
+* Sean McNamara -- added `takeOrdered` function, bug fixes, and a build fix
+* Mridul Muralidharan -- YARN integration (lead) and scheduler improvements
+* Marc Mercer -- improvements to UI json output
+* Christopher Nguyen -- bug fixes
+* Kay Ousterhout -- fix for scheduler regression and bug fixes
+* Xinghao Pan -- MLLib contributions
+* Nick Pentreath -- scala pageRank example
+* Alexander Pivovarov -- logging improvement and Maven build
+* Mike Potts -- configuration improvement
+* Imran Rashid -- bug fix and UI improvement
+* Charles Reiss -- bug fixes, code cleanup, performance improvements
+* Josh Rosen -- Python API improvements, Java API improvements, EC2 scripts and bug fixes
+* Henry Saputra -- Apache mentor
+* Jerry Shao -- bug fixes, metrics system
+* Prashant Sharma -- documentation
+* Mingfei Shi -- joblogger and bug fix
+* Andre Shumacher -- several PySpark features
+* Ginger Smith -- MLLib contribution
+* Evan Sparks -- contributions to MLLib
+* Ram Sriharsha -- bug fix and RDD removal feature
+* Ameet Talwalkar -- MLlib contributions
+* Roman Tkalenko -- code refactoring and cleanup
+* Chu Tong -- Java PageRank algorithm and bug fix in bash scripts
+* Shivaram Venkataraman -- bug fixes, contributions to MLLib, netty shuffle fixes, and Java API additions
+* Patrick Wendell -- release manager, bug fixes, documentation, metrics system, and web UI
+* Andrew Xia -- fair scheduler (lead), metrics system, and ui improvements
+* Reynold Xin -- shuffle improvements, bug fixes, code refactoring, usability improvements, MLLib contributions
+* Matei Zaharia -- MLLib contributions, documentation, examples, UI improvements, PySpark improvements, and bug fixes
+* Wu Zeming -- bug fix in scheduler
+* Bill Zhao -- log message improvement
+
+
+Thanks to everyone who contributed!
+We’d especially like to thank Patrick Wendell for acting as the release manager for this release.
diff --git a/site/docs/latest b/site/docs/latest
index b09a54cb9..8adc70fdd 120000
--- a/site/docs/latest
+++ b/site/docs/latest
@@ -1 +1 @@
-0.7.3
-\ No newline at end of file
+0.8.0
+\ No newline at end of file
diff --git a/site/documentation.html b/site/documentation.html
index abadbfd36..5ee734d0b 100644
--- a/site/documentation.html
+++ b/site/documentation.html
@@ -112,7 +112,8 @@
 <p>Setup instructions, programming guides, and other documentation are available for each version of Spark below:</p>
 
 <ul>
-  <li><a href="/docs/latest/">Spark 0.7.3 (latest release)</a></li>
+  <li><a href="/docs/latest/">Spark 0.8.0 (latest release)</a></li>
+  <li><a href="/docs/0.7.3/">Spark 0.7.3</a></li>
   <li><a href="/docs/0.6.2/">Spark 0.6.2</a></li>
   <li><a href="https://github.com/mesos/spark/wiki/Spark-0.5-Documentation">Spark 0.5.x</a> (hosted on GitHub)</li>
 </ul>
diff --git a/site/downloads.html b/site/downloads.html
index 4f7d1f8c9..0a717d10c 100644
--- a/site/downloads.html
+++ b/site/downloads.html
@@ -108,13 +108,14 @@
         
           <article class="page type-page status-publish hentry">
             <h2>Download Spark</h2>
-<p>The latest release of Spark is 0.7.3. You can either download it as a <a href="http://spark-project.org/download/spark-0.7.3-sources.tgz">source package</a> (4 MB tar.gz) or get prebuilt packages for <a href="http://spark-project.org/download/spark-0.7.3-prebuilt-hadoop1.tgz">Hadoop 1 / CDH3</a> or <a href="http://spark-project.org/download/spark-0.7.3-prebuilt-cdh4.tgz">CDH 4</a> (61 MB tar.gz).</p>
+<p>The latest release of Spark is 0.8.0. You can either download it as a <a href="http://spark-project.org/download/spark-0.8.0-incubating.tgz">source package</a> (4 MB tar.gz) or as a prebuilt package for <a href="http://spark-project.org/download/spark-0.8.0-incubating-bin-hadoop1.tgz">Hadoop 1 / CDH3</a> or <a href="http://spark-project.org/download/spark-0.8.0-incubating-bin-cdh4.tgz">CDH 4</a> (125 MB tar.gz).</p>
 
-<p>If you are interested in working with the newest under-development code or contributing to Spark development, you can also check out the master branch from Git: <tt>git clone git://github.com/mesos/spark.git</tt>.</p>
+<p>If you are interested in working with the newest under-development code or contributing to Spark development, you can also check out the master branch from Git: <tt>git clone git://github.com/apache/incubator-spark.git</tt>.</p>
 
 <p>Once you&#8217;ve downloaded Spark, you can find instructions for installing and building it on the <a href="/documentation.html">documentation page</a>.</p>
 <h3>Previous Releases</h3>
 <ul>
+	<li><a href="http://spark-project.org/download/spark-0.8.0-incubating.tgz">Spark 0.8.0</a> (September 25, 2013) <a href="/releases/spark-release-0-8-0.html">(release notes)</a> (prebuilt: <a href="http://spark-project.org/download/spark-0.8.0-incubating-bin-hadoop1.tgz">Hadoop 1 / CDH3</a>, <a href="http://spark-project.org/download/spark-0.8.0-incubating-bin-cdh4.tgz">CDH 4</a>)</li>
 	<li><a href="http://spark-project.org/download/spark-0.7.3-sources.tgz">Spark 0.7.3</a> (July 16, 2013) <a href="/releases/spark-release-0-7-3.html">(release notes)</a> (prebuilt: <a href="http://spark-project.org/download/spark-0.7.3-prebuilt-hadoop1.tgz">Hadoop 1 / CDH3</a>, <a href="http://spark-project.org/download/spark-0.7.3-prebuilt-cdh4.tgz">CDH 4</a>)</li>
 	<li><a href="http://spark-project.org/download/spark-0.7.2-sources.tgz">Spark 0.7.2</a> (June 2, 2013) <a href="/releases/spark-release-0-7-2.html">(release notes)</a> (prebuilt: <a href="http://spark-project.org/download/spark-0.7.2-prebuilt-hadoop1.tgz">Hadoop 1 / CDH3</a>, <a href="http://spark-project.org/download/spark-0.7.2-prebuilt-cdh4.tgz">CDH 4</a>)</li>
 	<li><a href="http://spark-project.org/download/spark-0.7.0-sources.tgz">Spark 0.7.0</a> (February 27, 2013) <a href="/releases/spark-release-0-7-0.html">(release notes)</a></li>
diff --git a/site/examples.html b/site/examples.html
index be37739a0..fd051b9b5 100644
--- a/site/examples.html
+++ b/site/examples.html
@@ -109,7 +109,7 @@
           <article class="page type-page status-publish hentry">
             <h2>Spark Examples</h2>
 
-<p>Spark is built around <em>distributed datasets</em> that support types of parallel operations: transformations, which are lazy and yield another distributed dataset (e.g., <code>map</code>, <code>filter</code>, and <code>join</code>), and actions, which force the computation of a dataset and return a result (e.g., <code>count</code>). The following examples show off some of the available operations and features.</p>
+<p>Spark is built around <em>distributed datasets</em> that support types of parallel operations: transformations, which are lazy and yield another distributed dataset (e.g., <code>map</code>, <code>filter</code>, and <code>join</code>), and actions, which force the computation of a dataset and return a result (e.g., <code>count</code>). The following examples show off some of the available operations and features. Several additional examples are distributed with Spark, both for core Spark (<a href="https://github.com/apache/incubator-spark/tree/master/examples/src/main/scala/org/apache/spark/examples">Scala examples</a>, <a href="https://github.com/apache/incubator-spark/tree/master/examples/src/main/java/org/apache/spark/examples">Java examples</a>, <a href="https://github.com/apache/incubator-spark/tree/master/python/examples">Python examples</a>) and streaming Spark (<a href="https://github.com/apache/incubator-spark/tree/master/examples/src/main/scala/org/apache/spark/streaming/examples">Scala examples</a>, <a href="https://github.com/apache/incubator-spark/tree/master/examples/src/main/java/org/apache/spark/streaming/examples">Java examples</a>).</p>
 
 <h3>Text Search</h3>
author	Patrick Wendell <pwendell@apache.org>	2013-09-25 21:12:46 +0000
committer	Patrick Wendell <pwendell@apache.org>	2013-09-25 21:12:46 +0000
commit	a5051c33c8728dcf2b84ae781b8d9bc05fb0e528 (patch)
tree	b4f19507a7aec2315548756b7ef49f9887eab00d
parent	43949b33e0f1ae3f52c878852cc81505276be00b (diff)
download	spark-website-a5051c33c8728dcf2b84ae781b8d9bc05fb0e528.tar.gz spark-website-a5051c33c8728dcf2b84ae781b8d9bc05fb0e528.tar.bz2 spark-website-a5051c33c8728dcf2b84ae781b8d9bc05fb0e528.zip