--- layout: post title: Spark Release 0.6.0 categories: - Releases tags: [] status: publish type: post published: true meta: _edit_last: '4' --- Spark 0.6.0 is a major release that brings several new features, architectural changes, and performance enhancements. The most visible additions are a standalone deploy mode, a Java API, and expanded documentation; but there are also numerous other changes under the hood, which improve performance in some cases by as much as 2x. You can download this release as either a source package (2 MB tar.gz) or prebuilt package (48 MB tar.gz)

Simpler Deployment

In addition to running on Mesos, Spark now has a standalone deploy mode that lets you quickly launch a cluster without installing an external cluster manager. The standalone mode only needs Java installed on each machine, and Spark deployed to it. In addition, there is experimental support for running on YARN (Hadoop NextGen), currently in a separate branch.

Java API

Java programmers can now use Spark through a new Java API layer. This layer makes available all of Spark's features, including parallel transformations, distributed datasets, broadcast variables, and accumulators, in a Java-friendly manner.

Expanded Documentation

Spark's documentation has been expanded with a new quick start guide, additional deployment instructions, configuration guide, tuning guide, and improved Scaladoc API documentation.

Engine Changes

Under the hood, Spark 0.6 has new, custom storage and communication layers brought in from the upcoming Spark Streaming project. These can improve performance over past versions by as much as 2x. Specifically:

New APIs

Enhanced Debugging

Spark's log now prints which operation in your program each RDD and job described in your logs belongs to, making it easier to tie back to which parts of your code experience problems.

Maven Artifacts

Spark is now available in Maven Central, making it easier to link into your programs without having to build it as a JAR. Use the following Maven identifiers to add it to a project:

Compatibility

This release is source-compatible with Spark 0.5 programs, but you will need to recompile them against 0.6. In addition, the configuration for caching has changed: instead of having a spark.cache.class parameter that sets one caching strategy for all RDDs, you can now set a per-RDD storage level. Spark will warn if you try to set spark.cache.class.

Credits

Spark 0.6 was the work of a large set of new contributors from Berkeley and outside.

Thanks also to all the Spark users who have diligently suggested features or reported bugs.