From 81d6089b47ec4d3e7fe17074f3b5fadec8070071 Mon Sep 17 00:00:00 2001 From: Andy Konwinski Date: Fri, 23 Aug 2013 17:17:53 +0000 Subject: Initial port of Spark website from spark-project.org wordpress to Jekyll. --- releases/_posts/2011-07-14-spark-release-0-3.md | 62 ++++++++++++ releases/_posts/2012-06-12-spark-release-0-5-0.md | 36 +++++++ releases/_posts/2012-10-11-spark-release-0-5-1.md | 46 +++++++++ releases/_posts/2012-10-15-spark-release-0-6-0.md | 90 +++++++++++++++++ releases/_posts/2012-11-22-spark-release-0-5-2.md | 15 +++ releases/_posts/2012-11-22-spark-release-0-6-1.md | 30 ++++++ releases/_posts/2013-02-07-spark-release-0-6-2.md | 43 +++++++++ releases/_posts/2013-02-27-spark-release-0-7-0.md | 112 ++++++++++++++++++++++ releases/_posts/2013-06-02-spark-release-0-7-2.md | 56 +++++++++++ releases/_posts/2013-07-16-spark-release-0-7-3.md | 49 ++++++++++ 10 files changed, 539 insertions(+) create mode 100644 releases/_posts/2011-07-14-spark-release-0-3.md create mode 100644 releases/_posts/2012-06-12-spark-release-0-5-0.md create mode 100644 releases/_posts/2012-10-11-spark-release-0-5-1.md create mode 100644 releases/_posts/2012-10-15-spark-release-0-6-0.md create mode 100644 releases/_posts/2012-11-22-spark-release-0-5-2.md create mode 100644 releases/_posts/2012-11-22-spark-release-0-6-1.md create mode 100644 releases/_posts/2013-02-07-spark-release-0-6-2.md create mode 100644 releases/_posts/2013-02-27-spark-release-0-7-0.md create mode 100644 releases/_posts/2013-06-02-spark-release-0-7-2.md create mode 100644 releases/_posts/2013-07-16-spark-release-0-7-3.md (limited to 'releases') diff --git a/releases/_posts/2011-07-14-spark-release-0-3.md b/releases/_posts/2011-07-14-spark-release-0-3.md new file mode 100644 index 000000000..4238398f4 --- /dev/null +++ b/releases/_posts/2011-07-14-spark-release-0-3.md @@ -0,0 +1,62 @@ +--- +layout: post +title: Spark Release 0.3 +categories: +- Releases +tags: [] +status: publish +type: post +published: true +--- +Spark 0.3 brings a variety of new features. You can download it for either Scala 2.9 or Scala 2.8. + +

Scala 2.9 Support

+ +This is the first release to support Scala 2.9 in addition to 2.8. Future releases are likely to be 2.9-only unless there is high demand for 2.8. + +

Save Operations

+ +You can now save distributed datasets to the Hadoop filesystem (HDFS), Amazon S3, Hypertable, and any other storage system supported by Hadoop. There are convenience methods for several common formats, like text files and SequenceFiles. For example, to save a dataset as text: + +
+val numbers = spark.parallelize(1 to 100)
numbers.saveAsTextFile("hdfs://...") +
+ +

Native Types for SequenceFiles

+ +In working with SequenceFiles, which store objects that implement Hadoop's Writable interface, Spark will now let you use native types for certain common Writable types, like IntWritable and Text. For example: + +
+// Will read a SequenceFile of (IntWritable, Text)
+val data = spark.sequenceFile[Int, String]("hdfs://...") +
+ +Similarly, you can save datasets of basic types directly as SequenceFiles: + +
+// Will write a SequenceFile of (IntWritable, IntWritable)
+val squares = spark.parallelize(1 to 100).map(n => (n, n*n))
+squares.saveAsSequenceFile("hdfs://...") +
+ +

Maven Integration

+ +Spark now fetches dependencies via Maven and can publish Maven artifacts for easier dependency management. + +

Faster Broadcast & Shuffle

+ +This release includes broadcast and shuffle algorithms from this paper to better support applications that communicate large amounts of data. + +

Support for Non-Filesystem Hadoop Input Formats

+ +The new SparkContext.hadoopRDD method allows reading data from Hadoop-compatible storage systems other than file systems, such as HBase, Hypertable, etc. + +

Other Features

+ + diff --git a/releases/_posts/2012-06-12-spark-release-0-5-0.md b/releases/_posts/2012-06-12-spark-release-0-5-0.md new file mode 100644 index 000000000..df27a0996 --- /dev/null +++ b/releases/_posts/2012-06-12-spark-release-0-5-0.md @@ -0,0 +1,36 @@ +--- +layout: post +title: Spark Release 0.5.0 +categories: +- Releases +tags: [] +status: publish +type: post +published: true +meta: + _edit_last: '1' +--- +Spark 0.5.0 brings several new features and sets the stage for some big changes coming this summer as we incorporate code from the Spark Streaming project. You can download it as a zip or tar.gz. + +

Mesos 0.9 Support

+ +This release runs on Apache Mesos 0.9, the first Apache Incubator release of Mesos, which contains significant usability and stability improvements. Most notable are better memory accounting for applications with long-term memory use, easier access of old jobs' traces and logs (by keeping a history of executed tasks on the web UI), and simpler installation. + +

Performance Improvements

+Spark's scheduling is more communication-efficient when sending out operations on RDDs with large lineage graphs. In addition, the cache replacement policy has been improved to more smartly replace data when an RDD does not fit in the cache, shuffles are more efficient, and the serializer used for shipping closures is now configurable, making it possible to use faster libraries than Java serialization there. + +

Debug Improvements

+ +Spark now reports exceptions on the worker nodes back to the master, so you can see them all in one log file. It also automatically marks and filters duplicate errors. + +

New Operators

+ +These include sortByKey for parallel sorting, takeSample, and more efficient fold and aggregate operators. In addition, more of the old operators make use of, and retain, RDD partitioning information to reduce communication cost. For example, if you join two hash-partitioned RDDs that were partitioned in the same way, Spark will not shuffle any data across the network. + +

EC2 Launch Script Improvements

+ +Spark's EC2 launch scripts are now included in the main package, and have the ability to discover and use the latest Spark AMI automatically instead of launching a hardcoded machine image ID. + +

New Hadoop API Support

+ +You can now use Spark to read and write data to storage formats in the new org.apache.mapreduce packages (the "new Hadoop" API). In addition, this release fixes an issue caused by a HDFS initialization bug in some recent versions of HDFS. diff --git a/releases/_posts/2012-10-11-spark-release-0-5-1.md b/releases/_posts/2012-10-11-spark-release-0-5-1.md new file mode 100644 index 000000000..c5c935ed6 --- /dev/null +++ b/releases/_posts/2012-10-11-spark-release-0-5-1.md @@ -0,0 +1,46 @@ +--- +layout: post +title: Spark Release 0.5.1 +categories: +- Releases +tags: [] +status: publish +type: post +published: true +meta: + _edit_last: '1' +--- +Spark 0.5.1 is a maintenance release that adds several important bug fixes and usability features. You can download it as a tar.gz file. + +

Maven Publishing

+ +Spark is now available in Maven Central, making it easier to link into your programs without having to build it as a JAR. Use the following Maven identifiers to add it to a project: + + +

Scala 2.9.2

+ +Spark now builds against Scala 2.9.2 by default. + +

Improved Accumulators

+ +The new Accumulable class generalizes Accumulators for the case when the type being accumulated is not the same as the types of elements being added (e.g. you wish to accumulate a collection, such as a Set, by adding individual elements). This interface is also more efficient in avoiding the creation of temporary objects. (Contributed by Imran Rashid.) + +

Bug Fixes

+ + + +

EC2 Improvements

+ +Spark's EC2 launch script now configures Spark's memory limit automatically based on the machine's available RAM. diff --git a/releases/_posts/2012-10-15-spark-release-0-6-0.md b/releases/_posts/2012-10-15-spark-release-0-6-0.md new file mode 100644 index 000000000..fb17f3037 --- /dev/null +++ b/releases/_posts/2012-10-15-spark-release-0-6-0.md @@ -0,0 +1,90 @@ +--- +layout: post +title: Spark Release 0.6.0 +categories: +- Releases +tags: [] +status: publish +type: post +published: true +meta: + _edit_last: '4' +--- +Spark 0.6.0 is a major release that brings several new features, architectural changes, and performance enhancements. The most visible additions are a standalone deploy mode, a Java API, and expanded documentation; but there are also numerous other changes under the hood, which improve performance in some cases by as much as 2x. + +You can download this release as either a source package (2 MB tar.gz) or prebuilt package (48 MB tar.gz) + +

Simpler Deployment

+ +In addition to running on Mesos, Spark now has a standalone deploy mode that lets you quickly launch a cluster without installing an external cluster manager. The standalone mode only needs Java installed on each machine, and Spark deployed to it. + +In addition, there is experimental support for running on YARN (Hadoop NextGen), currently in a separate branch. + +

Java API

+ +Java programmers can now use Spark through a new Java API layer. This layer makes available all of Spark's features, including parallel transformations, distributed datasets, broadcast variables, and accumulators, in a Java-friendly manner. + +

Expanded Documentation

+ +Spark's documentation has been expanded with a new quick start guide, additional deployment instructions, configuration guide, tuning guide, and improved Scaladoc API documentation. + +

Engine Changes

+ +Under the hood, Spark 0.6 has new, custom storage and communication layers brought in from the upcoming Spark Streaming project. These can improve performance over past versions by as much as 2x. Specifically: + + + +

New APIs

+ + + +

Enhanced Debugging

+ +Spark's log now prints which operation in your program each RDD and job described in your logs belongs to, making it easier to tie back to which parts of your code experience problems. + +

Maven Artifacts

+ +Spark is now available in Maven Central, making it easier to link into your programs without having to build it as a JAR. Use the following Maven identifiers to add it to a project: + + + +

Compatibility

+ +This release is source-compatible with Spark 0.5 programs, but you will need to recompile them against 0.6. In addition, the configuration for caching has changed: instead of having a spark.cache.class parameter that sets one caching strategy for all RDDs, you can now set a per-RDD storage level. Spark will warn if you try to set spark.cache.class. + +

Credits

+ +Spark 0.6 was the work of a large set of new contributors from Berkeley and outside. + + + +

Thanks also to all the Spark users who have diligently suggested features or reported bugs.

diff --git a/releases/_posts/2012-11-22-spark-release-0-5-2.md b/releases/_posts/2012-11-22-spark-release-0-5-2.md new file mode 100644 index 000000000..794f3521d --- /dev/null +++ b/releases/_posts/2012-11-22-spark-release-0-5-2.md @@ -0,0 +1,15 @@ +--- +layout: post +title: Spark Release 0.5.2 +categories: +- Releases +tags: [] +status: publish +type: post +published: true +meta: + _edit_last: '1' +--- +Spark 0.5.2 is a minor release, whose main addition is to allow Spark to compile against Hadoop 2 distributions. To do this, edit project/SparkBuild.scala and change both the HADOOP_VERSION and HADOOP_MAJOR_VERSION variables, then recompile Spark. This change was contributed by Thomas Dudziak. + +You can download Spark 0.5.2 as a tar.gz file (2 MB). diff --git a/releases/_posts/2012-11-22-spark-release-0-6-1.md b/releases/_posts/2012-11-22-spark-release-0-6-1.md new file mode 100644 index 000000000..6a0799429 --- /dev/null +++ b/releases/_posts/2012-11-22-spark-release-0-6-1.md @@ -0,0 +1,30 @@ +--- +layout: post +title: Spark Release 0.6.1 +categories: +- Releases +tags: [] +status: publish +type: post +published: true +meta: + _edit_last: '4' +--- +Spark 0.6.1 is a maintenance release that contains several important bug fixes and performance improvements. You can download it as a source package (2.4 MB tar.gz) or prebuilt package (48 MB tar.gz). + +The fixes and improvements in this version include: + + +We recommend that all Spark 0.6 users update to this maintenance release. diff --git a/releases/_posts/2013-02-07-spark-release-0-6-2.md b/releases/_posts/2013-02-07-spark-release-0-6-2.md new file mode 100644 index 000000000..e44c72a70 --- /dev/null +++ b/releases/_posts/2013-02-07-spark-release-0-6-2.md @@ -0,0 +1,43 @@ +--- +layout: post +title: Spark Release 0.6.2 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: + _edit_last: '4' + _wpas_done_all: '1' +--- +Spark 0.6.2 is a maintenance release that contains several bug fixes and usability improvements. You can download it as a source package (2.5 MB tar.gz) or prebuilt package (48 MB tar.gz). + +We recommend that all Spark 0.6 users update to this maintenance release. + +The fixes and improvements in this version include: + + +In total, eleven people contributed to this release: + diff --git a/releases/_posts/2013-02-27-spark-release-0-7-0.md b/releases/_posts/2013-02-27-spark-release-0-7-0.md new file mode 100644 index 000000000..ffb48a9e8 --- /dev/null +++ b/releases/_posts/2013-02-27-spark-release-0-7-0.md @@ -0,0 +1,112 @@ +--- +layout: post +title: Spark Release 0.7.0 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: + _edit_last: '4' + _wpas_done_all: '1' +--- +The Spark team is proud to release version 0.7.0, a new major release that brings several new features. Most notable are a Python API for Spark and an alpha of Spark Streaming. (Details on Spark Streaming can also be found in this technical report.) The release also adds numerous other improvements across the board. Overall, this is our biggest release to date, with 31 contributors, of which 20 were external to Berkeley. + +You can download Spark 0.7.0 as either a source package (4 MB tar.gz) or prebuilt package (60 MB tar.gz). + +

Python API

+ +Spark 0.7 adds a Python API called PySpark that makes it possible to use Spark from Python, both in standalone programs and in interactive Python shells. It uses the standard CPython runtime, so your programs can call into native libraries like NumPy and SciPy. Like the Scala and Java APIs, PySpark will automatically ship functions from your main program, along with the variables they depend on, to the cluster. PySpark supports most Spark features, including RDDs, accumulators, broadcast variables, and HDFS input and output. + +

Spark Streaming Alpha

+ +Spark Streaming is a new extension of Spark that adds near-real-time processing capability. It offers a simple and high-level API, where users can transform streams using parallel operations like map, filter, reduce, and new sliding window functions. It automatically distributes work over a cluster and provides efficient fault recovery with exactly-once semantics for transformations, without relying on costly transactions to an external system. Spark Streaming is described in more detail in these slides and our technical report. This release is our first alpha of Spark Streaming, with most of the functionality implemented and APIs in Java and Scala. + +

Memory Dashboard

+ +Spark jobs now launch a web dashboard for monitoring the memory usage of each distributed dataset (RDD) in the program. Look for lines like this in your log: + +15:08:44 INFO BlockManagerUI: Started BlockManager web UI at http://mbk.local:63814 + +You can also control which port to use through the spark.ui.port property. + +

Maven Build

+ +Spark can now be built using Maven in addition to SBT. The Maven build enables easier publishing to repositories of your choice, easy selection of Hadoop versions using the Maven profile (-Phadoop1 or -Phadoop2), as well as Debian packaging using mvn -Phadoop1,deb install. + +

New Operations

+ +This release adds several RDD transformations, including keys, values, keyBy, subtract, coalesce, zip. It also adds SparkContext.hadoopConfiguration to allow programs to configure Hadoop input/output settings globally across operations. Finally, it adds the RDD.toDebugString() method, which can be used to print an RDD's lineage graph for troubleshooting. + +

EC2 Improvements

+ + + +

Other Improvements

+ + + +

Compatibility

+ +This release is API-compatible with Spark 0.6 programs, but the following features changed slightly: + + +

Credits

+ +Spark 0.7 was the work of many contributors from Berkeley and outside---in total, 31 different contributors, of which 20 were from outside Berkeley. Here are the people who contributed, along with areas they worked on: + + + +Thanks to everyone who contributed! diff --git a/releases/_posts/2013-06-02-spark-release-0-7-2.md b/releases/_posts/2013-06-02-spark-release-0-7-2.md new file mode 100644 index 000000000..9b1ed38f0 --- /dev/null +++ b/releases/_posts/2013-06-02-spark-release-0-7-2.md @@ -0,0 +1,56 @@ +--- +layout: post +title: Spark Release 0.7.2 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: + _edit_last: '4' + _wpas_done_all: '1' +--- +Spark 0.7.2 is a maintenance release that contains multiple bug fixes and improvements. You can download it as a source package (4 MB tar.gz) or get prebuilt packages for Hadoop 1 / CDH3 or CDH 4 (61 MB tar.gz). + + +We recommend that all users update to this maintenance release. + + +The fixes and improvements in this version include: + + +The following people contributed to this release: + + +We thank everyone who helped with this release, and hope to see more contributions from you in the future! diff --git a/releases/_posts/2013-07-16-spark-release-0-7-3.md b/releases/_posts/2013-07-16-spark-release-0-7-3.md new file mode 100644 index 000000000..39b35fb8f --- /dev/null +++ b/releases/_posts/2013-07-16-spark-release-0-7-3.md @@ -0,0 +1,49 @@ +--- +layout: post +title: Spark Release 0.7.3 +categories: +- Releases +tags: [] +status: publish +type: post +published: true +meta: + _edit_last: '4' + _wpas_done_all: '1' +--- +Spark 0.7.3 is a maintenance release with several bug fixes, performance fixes, and new features. You can download it as a source package (4 MB tar.gz) or get prebuilt packages for Hadoop 1 / CDH3 or for CDH 4 (61 MB tar.gz). + +We recommend that all users update to this maintenance release. + +The improvements in this release include: + + + +The following people contributed to this release: + + -- cgit v1.2.3