As we continue developing Spark, we would love to get feedback from users and hear what you’d like us to work on next. We’ve decided that a good way to do that is a survey – we hope to run this at regular intervals. If you have a few minutes to participate, fill in the survey here. Your time is greatly appreciated.
Spark News Archive
Fourth Spark screencast released
We have released the next screencast, A Standalone Job in Scala that takes you beyond the Spark shell, helping you write your first standalone Spark job.
Registration open for AMP Camp training camp in Berkeley
Want to learn how to use Spark, Shark, GraphX, and related technologies in person? The AMP Lab is hosting a two-day training workshop for them on August 29th and 30th in Berkeley. The workshop will include tutorials, talks from users, and over four hours of hands-on exercises. Registration is now open on the AMP Camp website, for a price of $250 per person. We recommend signing up early because last year’s workshop was sold out.
Spark mailing lists moving to Apache
As part of the Spark project's recent move to Apache, we are planning to migrate the mailing lists to Apache infrastructure this month, so that the existing Google groups will become read-only on September 1, 2013. To keep receiving updates about Spark or to participate in development discussions, please subscribe to the following lists:
- user@spark.incubator.apache.org -- for usage questions, help, and announcements. (subscribe) (archives)
- dev@spark.incubator.apache.org -- for people who want to contribute code to Spark. (subscribe) (archives)
Most users will probably want the User list, but individuals interested in contributing code to the project should also subscribe to the Dev list.
Spark 0.7.3 released
We’ve just posted Spark Release 0.7.3, a maintenance release that contains several fixes, including streaming API updates and new functionality for adding JARs to a spark-shell
session. We recommend that all users update to this release. Visit the release notes to read about the new features, or download the release today.
Spark featured in Wired
Spark, its creators at the AMP Lab, and some of its users were featured in a Wired Enterprise article a few days ago. Read on to learn a little about how Spark is being used in industry.
Spark accepted into Apache Incubator
Spark was recently accepted into the Apache Incubator, which will serve as the long-term home for the project. While moving the source code and issue tracking to Apache will take some time, we are excited to be joining the community at Apache. Stay tuned on this site for updates on how the project hosting will change.
Spark 0.7.2 released
We’re happy to announce the release of Spark 0.7.2, a new maintenance release that includes several bug fixes and improvements, as well as new code examples and API features. We recommend that all users update to this release. Head over to the release notes to read about the new features, or download the release today.
Spark screencasts published
We have released the first two screencasts in a series of short hands-on video training courses we will be publishing to help new users get up and running with Spark in minutes.
The first Spark screencast is called First Steps With Spark and walks you through downloading and building Spark, as well as using the Spark shell, all in less than 10 minutes!
The second screencast is a 2 minute overview of the Spark documentation.
We hope you find these screencasts useful.
Strata exercises now available online
At this year’s Strata conference, the AMP Lab hosted a full day of tutorials on Spark, Shark, and Spark Streaming, including online exercises on Amazon EC2. Those exercises are now available online, letting you learn Spark and Shark at your own pace on an EC2 cluster with real data. They are a great resource for learning the systems. You can also find slides from the Strata tutorials online, as well as videos from the AMP Camp workshop we held at Berkeley in August.
Spark 0.7.0 released
We’re proud to announce the release of Spark 0.7.0, a new major version of Spark that adds several key features, including a Python API for Spark and an alpha of Spark Streaming. This release is the result of the largest group of contributors yet behind a Spark release – 31 contributors from inside and outside Berkeley. Head over to the release notes to read more about the new features, or download the release today.
Spark/Shark Tutorial for Amazon EMR
This weekend, Amazon posted an article and code that make it easy to launch Spark and Shark on Elastic MapReduce. The article includes examples of how to run both interactive Scala commands and SQL queries from Shark on data in S3. Head over to the Amazon article for details. We’re very excited because, to our knowledge, this makes Spark the first non-Hadoop engine that you can launch with EMR.
Spark 0.6.2 released
We recently released Spark 0.6.2, a new version of Spark. This is a maintenance release that includes several bug fixes and usability improvements (see the release notes). We recommend that all users upgrade to this release.
Spark tips from Quantifind
Quantifind, one of the Bay Area companies that has been using Spark for predictive analytics, recently posted two useful entries on working with Spark in their tech blog:
Thanks for sharing this, and looking forward to see others!
Video up from first Spark development meetup
On December 18th, we held the first of a series of Spark development meetups, for people interested in learning the Spark codebase and contributing to the project. There was quite a bit more demand than we anticipated, with over 80 people signing up and 64 attending. The first meetup was an introduction to Spark internals. Thanks to one of the attendees, there’s now a video of the meetup on YouTube. We’ve also posted the slides. Look to see more development meetups on Spark and Shark in the future.
Spark and Shark in the news
Recently, we’ve seen quite a bit of coverage of both Spark and Shark in the news. I wanted to list some of the more recent articles, for readers interested in learning more.
- Curt Monash, editor of the popular DBMS2 blog, wrote a great introduction to Spark and Shark, as well as a more detailed technical overview.
- Silicon Angle covered Spark and Shark after our presentation at Amazon re:Invent.
- Datanami highlighted Shark in its survey of big data research projects.
- O'Reilly's Strata blog recently covered Spark, Shark, and the Spark 0.6 release.
- DataInformed interviewed two Spark users and wrote about their applications in anomaly detection, predictive analytics and data mining.
In other news, there will be a full day of tutorials on Spark and Shark at the O’Reilly Strata conference in February. They include a three-hour introduction to Spark, Shark and BDAS Tuesday morning, and a three-hour hands-on exercise session.
Spark 0.6.1 and 0.5.2 out
Today we’ve made available two maintenance releases for Spark: 0.6.1 and 0.5.2. They both contain important bug fixes as well as some new features, such as the ability to build against Hadoop 2 distributions. We recommend that users update to the latest version for their branch; for new users, we recommend 0.6.1.
Spark version 0.6.0 released
Spark version 0.6.0 was released today, a major release that brings a wide range of performance improvements and new features, including a simpler standalone deploy mode and a Java API. Read more about it in the release notes.
Spark wins Best Paper Award at USENIX NSDI
Our paper on Spark won the Best Paper Award at the USENIX NSDI conference. You can see a video of the talk, as well as slides, online on the NSDI website.
We've started hosting a Bay Area Spark User Meetup
We’ve started hosting a regular Bay Area Spark User Meetup. Sign up on the meetup.com page to be notified about events and meet other Spark developers and users.