---
layout: post
title: Spark Release 1.0.1
categories: []
tags: []
status: publish
type: post
published: true
meta:
_edit_last: '4'
_wpas_done_all: '1'
---
Spark 1.0.1 is a maintenance release with several stability fixes and a few new features in Spark’s SQL (alpha) library. This release is based on the [branch-1.0](https://github.com/apache/spark/tree/branch-1.0) maintenance branch of Spark. We recommend users follow the head of this branch to get the most recent stable version of Spark.
You can download Spark 1.0.1 as either a
source package
(5 MB tgz) or a prebuilt package for
Hadoop 1 / CDH3,
CDH4, or
Hadoop 2 / CDH5 / HDP2
(160 MB tgz). Release signatures and checksums are available at the official [Apache download site](http://www.apache.org/dist/spark/spark-1.0.1/).
### Fixes
Spark 1.0.1 contains stability fixes in several components. Some of the more important fixes are highlighted below. You can visit the [Spark issue tracker](http://s.apache.org/5zh) for an exhaustive list of fixes.
#### Spark Core
- Issue with missing keys during external aggregations ([SPARK-2043](https://issues.apache.org/jira/browse/SPARK-2043))
- Issue during job failures in Mesos mode ([SPARK-1749](https://issues.apache.org/jira/browse/SPARK-1749))
- Error when defining case classes in Scala shell ([SPARK-1199](https://issues.apache.org/jira/browse/SPARK-1199))
- Proper support for r3.xlarge instances on AWS ([SPARK-1790](https://issues.apache.org/jira/browse/SPARK-1790))
#### PySpark
- Issue causing crashes when large numbers of tasks finish quickly ([SPARK-2282](https://issues.apache.org/jira/browse/SPARK-2282))
- Issue importing MLlib in YARN-client mode ([SPARK-2172](https://issues.apache.org/jira/browse/SPARK-2172))
- Incorrect behavior when hashing None ([SPARK-1468](https://issues.apache.org/jira/browse/SPARK-1468))
#### MLlib
- Added compatibility for numpy 1.4 ([SPARK-2091](https://issues.apache.org/jira/browse/SPARK-2091))
- Concurrency issue in random sampler ([SPARK-2251](https://issues.apache.org/jira/browse/SPARK-2251))
- NotSerailizable exception in ALS ([SPARK-1977](https://issues.apache.org/jira/browse/SPARK-1977))
#### Streaming
- Key not found when slow receiver starts ([SPARK-2009](https://issues.apache.org/jira/browse/SPARK-2009))
- Resource clean-up with KafkaInputDStream ([SPARK-2034](https://issues.apache.org/jira/browse/SPARK-2034))
- Issue with Flume events larger than 1020 bytes ([SPARK-1916](https://issues.apache.org/jira/browse/SPARK-1916))
### SparkSQL Features
- Support for querying JSON datasets ([SPARK-2060](https://issues.apache.org/jira/browse/SPARK-2060)).
- Improved reading and writing Parquet data, including support for nested records and arrays ([SPARK-1293](https://issues.apache.org/jira/browse/SPARK-1293), [SPARK-2195](https://issues.apache.org/jira/browse/SPARK-2195), [SPARK-1913](https://issues.apache.org/jira/browse/SPARK-1913), and [SPARK-1487](https://issues.apache.org/jira/browse/SPARK-1487)).
- Improved support for SQL commands (`CACHE TABLE`, `DESCRIBE`, SHOW TABLES) ([SPARK-1968](https://issues.apache.org/jira/browse/SPARK-1968), [SPARK-2128](https://issues.apache.org/jira/browse/SPARK-2128), and [SPARK-1704](https://issues.apache.org/jira/browse/SPARK-1704)).
- Support for SQL specific configuration (initially used for setting number of partitions) ([SPARK-1508](https://issues.apache.org/jira/browse/SPARK-1508)).
- Idempotence for DDL operations ([SPARK-2191](https://issues.apache.org/jira/browse/SPARK-2191)).
### Known Issues
This release contains one known issue: multi-statement lines the REPL with internal references (`> val x = 10; val y = x + 10`) produce exceptions ([SPARK-2452](https://issues.apache.org/jira/browse/SPARK-2452)). This will be fixed shortly on the 1.0 branch; the fix will be included in the 1.0.2 release.
### Contributors
The following developers contributed to this release:
* Aaron Davidson -- bug fixes in PySpark and Spark core
* Ali Ghodsi -- documentation update
* Anant -- compatibility fix for spark-ec2 script
* Anatoli Fomenko -- MLlib doc fix
* Andre Schumacher -- nested Parquet data
* Andrew Ash -- documentation
* Andrew Or -- bug fixes and documentation
* Ankur Dave -- bug fixes
* Arkadiusz Komarzewski -- doc fix
* Baishuo -- sql fix
* Chen Chao -- comment fix and bug fix
* Cheng Hao -- SQL features
* Cheng Lian -- SQL features
* Christian Tzolov -- build improvmenet
* Clément MATHIEU -- doc updates
* CodingCat -- doc updates and bug fix
* Colin McCabe -- bug fix
* Daoyuan -- SQL joins
* David Lemieux -- bug fix
* Derek Ma -- bug fix
* Doris Xin -- bug fix
* Erik Selin -- PySpark fix
* Gang Bai -- bug fix
* Guoqiang Li -- bug fixes
* Henry Saputra -- documentation
* Jiang -- doc fix
* Joy Yoj -- bug fix
* Jyotiska NK -- test improvement
* Kan Zhang -- PySpark SQL features
* Kay Ousterhout -- documentation fix
* LY Lai -- bug fix
* Lars Albertsson -- bug fix
* Lei Zhang -- SQL fix and feature
* Mark Hamstra -- bug fix
* Matei Zaharia -- doc updates and bug fix
* Matthew Farrellee -- bug fixes
* Michael Armbrust -- sql features and fixes
* Neville Li -- buf fix
* Nick Chammas -- doc fix
* Ori Kremer -- bug fix
* Patrick Wendell -- documentation and release manager
* Prashant Sharma -- bug and doc fixes
* Qiuzhuang.Lian -- bug fix
* Raymond Liu -- bug fix
* Ravikanth Nawada -- bug fixes
* Reynold Xin -- bug and doc fixes
* Sameer Agarwal -- optimization
* Sandy Ryza -- doc fix
* Sean Owen -- bug fix
* Sebastien Rainville -- bug fix
* Shixiong Zhu -- code clean-up
* Szul, Piotr -- bug fix
* Takuya UESHIN -- bug fixes and SQL features
* Thomas Graves -- bug fix
* Uri Laserson -- bug fix
* Vadim Chekan -- bug fix
* Varakhedi Sujeet -- ec2 r3 support
* Vlad -- doc fix
* Wang Lianhui -- bug fix
* Wenchen Fan -- optimization
* William Benton -- SQL feature
* Xi Liu -- SQL feature
* Xiangrui Meng -- bug fix
* Ximo Guanter Gonzalbez -- SQL feature
* Yadid Ayzenberg -- doc fix
* Yijie Shen -- buf fix
* Yin Huai -- JSON support and bug fixes
* Zhen Peng -- bug fix
* Zichuan Ye -- ec2 fixes
* Zongheng Yang -- sql fixes
_Thanks to everyone who contributed!_