+ Spark 1.0.1 released | Apache Spark + +

Latest News

Spark 1.0.1 released + (Jul 11, 2014)
Two weeks to Spark Summit 2014 + (Jun 16, 2014)
Spark 1.0.0 released + (May 30, 2014)
Spark Summit agenda posted + (May 11, 2014)

Spark Release 1.0.1

+ + +

Spark 1.0.1 is a maintenance release with several stability fixes and a few new features in Spark’s SQL (alpha) library. This release is based on the branch-1.0 maintenance branch of Spark. We recommend users follow the head of this branch to get the most recent stable version of Spark.

+ +

You can download Spark 1.0.1 as either a +source package +(5 MB tgz) or a prebuilt package for +Hadoop 1 / CDH3, +CDH4, or +Hadoop 2 / CDH5 / HDP2 +(160 MB tgz). Release signatures and checksums are available at the official Apache download site.

+ +

Fixes

Spark 1.0.1 contains stability fixes in several components. Some of the more important fixes are highlighted below. You can visit the Spark issue tracker for an exhaustive list of fixes.

+ +

Spark Core

Issue with missing keys during external aggregations (SPARK-2043)
Issue during job failures in Mesos mode (SPARK-1749)
Error when defining case classes in Scala shell (SPARK-1199)
Proper support for r3.xlarge instances on AWS (SPARK-1790)

+ +

PySpark

Issue causing crashes when large numbers of tasks finish quickly (SPARK-2282)
Issue importing MLlib in YARN-client mode (SPARK-2172)
Incorrect behavior when hashing None (SPARK-1468)

+ +

MLlib

Added compatibility for numpy 1.4 (SPARK-2091)
Concurrency issue in random sampler (SPARK-2251)
NotSerailizable exception in ALS (SPARK-1977)

+ +

Streaming

Key not found when slow receiver starts (SPARK-2009)
Resource clean-up with KafkaInputDStream (SPARK-2034)
Issue with Flume events larger than 1020 bytes (SPARK-1916)

+ +

SparkSQL Features

Support for querying JSON datasets (SPARK-2060).
Improved reading and writing Parquet data, including support for nested records and arrays (SPARK-1293, SPARK-2195, SPARK-1913, and SPARK-1487).
Improved support for SQL commands (CACHE TABLE, DESCRIBE, SHOW TABLES) (SPARK-1968, SPARK-2128, and SPARK-1704).
Support for SQL specific configuration (initially used for setting number of partitions) (SPARK-1508).
Idempotence for DDL operations (SPARK-2191).

+ +

Known Issues

This release contains one known issue: multi-statement lines the REPL with internal references (> val x = 10; val y = x + 10) produce exceptions (SPARK-2452). This will be fixed shortly on the 1.0 branch; the fix will be included in the 1.0.2 release.

+ +

Contributors

The following developers contributed to this release:

+ +

Aaron Davidson – bug fixes in PySpark and Spark core
Ali Ghodsi – documentation update
Anant – compatibility fix for spark-ec2 script
Anatoli Fomenko – MLlib doc fix
Andre Schumacher – nested Parquet data
Andrew Ash – documentation
Andrew Or – bug fixes and documentation
Ankur Dave – bug fixes
Arkadiusz Komarzewski – doc fix
Baishuo – sql fix
Chen Chao – comment fix and bug fix
Cheng Hao – SQL features
Cheng Lian – SQL features
Christian Tzolov – build improvmenet
Clément MATHIEU – doc updates
CodingCat – doc updates and bug fix
Colin McCabe – bug fix
Daoyuan – SQL joins
David Lemieux – bug fix
Derek Ma – bug fix
Doris Xin – bug fix
Erik Selin – PySpark fix
Gang Bai – bug fix
Guoqiang Li – bug fixes
Henry Saputra – documentation
Jiang – doc fix
Joy Yoj – bug fix
Jyotiska NK – test improvement
Kan Zhang – PySpark SQL features
Kay Ousterhout – documentation fix
LY Lai – bug fix
Lars Albertsson – bug fix
Lei Zhang – SQL fix and feature
Mark Hamstra – bug fix
Matei Zaharia – doc updates and bug fix
Matthew Farrellee – bug fixes
Michael Armbrust – sql features and fixes
Neville Li – buf fix
Nick Chammas – doc fix
Ori Kremer – bug fix
Patrick Wendell – documentation and release manager
Prashant Sharma – bug and doc fixes
Qiuzhuang.Lian – bug fix
Raymond Liu – bug fix
Ravikanth Nawada – bug fixes
Reynold Xin – bug and doc fixes
Sameer Agarwal – optimization
Sandy Ryza – doc fix
Sean Owen – bug fix
Sebastien Rainville – bug fix
Shixiong Zhu – code clean-up
Szul, Piotr – bug fix
Takuya UESHIN – bug fixes and SQL features
Thomas Graves – bug fix
Uri Laserson – bug fix
Vadim Chekan – bug fix
Varakhedi Sujeet – ec2 r3 support
Vlad – doc fix
Wang Lianhui – bug fix
Wenchen Fan – optimization
William Benton – SQL feature
Xi Liu – SQL feature
Xiangrui Meng – bug fix
Ximo Guanter Gonzalbez – SQL feature
Yadid Ayzenberg – doc fix
Yijie Shen – buf fix
Yin Huai – JSON support and bug fixes
Zhen Peng – bug fix
Zichuan Ye – ec2 fixes
Zongheng Yang – sql fixes

+ +

Thanks to everyone who contributed!

+ + +

+
+Spark News Archive +

+ +