diff options
author | Patrick Wendell <pwendell@apache.org> | 2014-07-12 00:41:27 +0000 |
---|---|---|
committer | Patrick Wendell <pwendell@apache.org> | 2014-07-12 00:41:27 +0000 |
commit | ac807a0867562c68243f3d93df6c2f9600d2d799 (patch) | |
tree | c1bc70b3383d115e82de9c7f6fe7726ab77c3bbe /releases/_posts/2014-07-11-spark-release-1-0-1.md | |
parent | 0beac4e243f85e71554fe04093b09eb1745fea82 (diff) | |
download | spark-website-ac807a0867562c68243f3d93df6c2f9600d2d799.tar.gz spark-website-ac807a0867562c68243f3d93df6c2f9600d2d799.tar.bz2 spark-website-ac807a0867562c68243f3d93df6c2f9600d2d799.zip |
Adding 1.0.1 release of Spark.
Diffstat (limited to 'releases/_posts/2014-07-11-spark-release-1-0-1.md')
-rw-r--r-- | releases/_posts/2014-07-11-spark-release-1-0-1.md | 132 |
1 files changed, 132 insertions, 0 deletions
diff --git a/releases/_posts/2014-07-11-spark-release-1-0-1.md b/releases/_posts/2014-07-11-spark-release-1-0-1.md new file mode 100644 index 000000000..1858ffcd6 --- /dev/null +++ b/releases/_posts/2014-07-11-spark-release-1-0-1.md @@ -0,0 +1,132 @@ +--- +layout: post +title: Spark Release 1.0.1 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: + _edit_last: '4' + _wpas_done_all: '1' +--- + +Spark 1.0.1 is a maintenance release with several stability fixes and a few new features in Spark’s SQL (alpha) library. This release is based on the [branch-1.0](https://github.com/apache/spark/tree/branch-1.0) maintenance branch of Spark. We recommend users follow the head of this branch to get the most recent stable version of Spark. + +You can download Spark 1.0.1 as either a +<a href="http://d3kbcqa49mib13.cloudfront.net/spark-1.0.1.tgz" onClick="trackOutboundLink(this, 'Release Download Links', 'cloudfront_spark-1.0.1.tgz'); return false;">source package</a> +(5 MB tgz) or a prebuilt package for +<a href="http://d3kbcqa49mib13.cloudfront.net/spark-1.0.1-bin-hadoop1.tgz" onClick="trackOutboundLink(this, 'Release Download Links', 'cloudfront_spark-1.0.1-bin-hadoop1.tgz'); return false;">Hadoop 1 / CDH3</a>, +<a href="http://d3kbcqa49mib13.cloudfront.net/spark-1.0.1-bin-cdh4.tgz" onClick="trackOutboundLink(this, 'Release Download Links', 'cloudfront_spark-1.0.1-bin-cdh4.tgz'); return false;">CDH4</a>, or +<a href="http://d3kbcqa49mib13.cloudfront.net/spark-1.0.1-bin-hadoop2.tgz" onClick="trackOutboundLink(this, 'Release Download Links', 'cloudfront_spark-1.0.1-bin-hadoop2.tgz'); return false;">Hadoop 2 / CDH5 / HDP2</a> +(160 MB tgz). Release signatures and checksums are available at the official [Apache download site](http://www.apache.org/dist/spark/spark-1.0.1/). + +### Fixes +Spark 1.0.1 contains stability fixes in several components. Some of the more important fixes are highlighted below. You can visit the [Spark issue tracker](http://s.apache.org/5zh) for an exhaustive list of fixes. + +#### Spark Core + - Issue with missing keys during external aggregations ([SPARK-2043](https://issues.apache.org/jira/browse/SPARK-2043)) + - Issue during job failures in Mesos mode ([SPARK-1749](https://issues.apache.org/jira/browse/SPARK-1749)) + - Error when defining case classes in Scala shell ([SPARK-1199](https://issues.apache.org/jira/browse/SPARK-1199)) + - Proper support for r3.xlarge instances on AWS ([SPARK-1790](https://issues.apache.org/jira/browse/SPARK-1790)) + +#### PySpark + - Issue causing crashes when large numbers of tasks finish quickly ([SPARK-2282](https://issues.apache.org/jira/browse/SPARK-2282)) + - Issue importing MLlib in YARN-client mode ([SPARK-2172](https://issues.apache.org/jira/browse/SPARK-2172)) + - Incorrect behavior when hashing None ([SPARK-1468](https://issues.apache.org/jira/browse/SPARK-1468)) + +#### MLlib + - Added compatibility for numpy 1.4 ([SPARK-2091](https://issues.apache.org/jira/browse/SPARK-2091)) + - Concurrency issue in random sampler ([SPARK-2251](https://issues.apache.org/jira/browse/SPARK-2251)) + - NotSerailizable exception in ALS ([SPARK-1977](https://issues.apache.org/jira/browse/SPARK-1977)) + +#### Streaming + - Key not found when slow receiver starts ([SPARK-2009](https://issues.apache.org/jira/browse/SPARK-2009)) + - Resource clean-up with KafkaInputDStream ([SPARK-2034](https://issues.apache.org/jira/browse/SPARK-2034)) + - Issue with Flume events larger than 1020 bytes ([SPARK-1916](https://issues.apache.org/jira/browse/SPARK-1916)) + +### SparkSQL Features + - Support for querying JSON datasets ([SPARK-2060](https://issues.apache.org/jira/browse/SPARK-2060)). + - Improved reading and writing Parquet data, including support for nested records and arrays ([SPARK-1293](https://issues.apache.org/jira/browse/SPARK-1293), [SPARK-2195](https://issues.apache.org/jira/browse/SPARK-2195), [SPARK-1913](https://issues.apache.org/jira/browse/SPARK-1913), and [SPARK-1487](https://issues.apache.org/jira/browse/SPARK-1487)). + - Improved support for SQL commands (`CACHE TABLE`, `DESCRIBE`, SHOW TABLES) ([SPARK-1968](https://issues.apache.org/jira/browse/SPARK-1968), [SPARK-2128](https://issues.apache.org/jira/browse/SPARK-2128), and [SPARK-1704](https://issues.apache.org/jira/browse/SPARK-1704)). + - Support for SQL specific configuration (initially used for setting number of partitions) ([SPARK-1508](https://issues.apache.org/jira/browse/SPARK-1508)). + - Idempotence for DDL operations ([SPARK-2191](https://issues.apache.org/jira/browse/SPARK-2191)). + +### Known Issues +This release contains one known issue: multi-statement lines the REPL with internal references (`> val x = 10; val y = x + 10`) produce exceptions ([SPARK-2452](https://issues.apache.org/jira/browse/SPARK-2452)). This will be fixed shortly on the 1.0 branch; the fix will be included in the 1.0.2 release. + +### Contributors +The following developers contributed to this release: + + * Aaron Davidson -- bug fixes in PySpark and Spark core + * Ali Ghodsi -- documentation update + * Anant -- compatibility fix for spark-ec2 script + * Anatoli Fomenko -- MLlib doc fix + * Andre Schumacher -- nested Parquet data + * Andrew Ash -- documentation + * Andrew Or -- bug fixes and documentation + * Ankur Dave -- bug fixes + * Arkadiusz Komarzewski -- doc fix + * Baishuo -- sql fix + * Chen Chao -- comment fix and bug fix + * Cheng Hao -- SQL features + * Cheng Lian -- SQL features + * Christian Tzolov -- build improvmenet + * Clément MATHIEU -- doc updates + * CodingCat -- doc updates and bug fix + * Colin McCabe -- bug fix + * Daoyuan -- SQL joins + * David Lemieux -- bug fix + * Derek Ma -- bug fix + * Doris Xin -- bug fix + * Erik Selin -- PySpark fix + * Gang Bai -- bug fix + * Guoqiang Li -- bug fixes + * Henry Saputra -- documentation + * Jiang -- doc fix + * Joy Yoj -- bug fix + * Jyotiska NK -- test improvement + * Kan Zhang -- PySpark SQL features + * Kay Ousterhout -- documentation fix + * LY Lai -- bug fix + * Lars Albertsson -- bug fix + * Lei Zhang -- SQL fix and feature + * Mark Hamstra -- bug fix + * Matei Zaharia -- doc updates and bug fix + * Matthew Farrellee -- bug fixes + * Michael Armbrust -- sql features and fixes + * Neville Li -- buf fix + * Nick Chammas -- doc fix + * Ori Kremer -- bug fix + * Patrick Wendell -- documentation and release manager + * Prashant Sharma -- bug and doc fixes + * Qiuzhuang.Lian -- bug fix + * Raymond Liu -- bug fix + * Ravikanth Nawada -- bug fixes + * Reynold Xin -- bug and doc fixes + * Sameer Agarwal -- optimization + * Sandy Ryza -- doc fix + * Sean Owen -- bug fix + * Sebastien Rainville -- bug fix + * Shixiong Zhu -- code clean-up + * Szul, Piotr -- bug fix + * Takuya UESHIN -- bug fixes and SQL features + * Thomas Graves -- bug fix + * Uri Laserson -- bug fix + * Vadim Chekan -- bug fix + * Varakhedi Sujeet -- ec2 r3 support + * Vlad -- doc fix + * Wang Lianhui -- bug fix + * Wenchen Fan -- optimization + * William Benton -- SQL feature + * Xi Liu -- SQL feature + * Xiangrui Meng -- bug fix + * Ximo Guanter Gonzalbez -- SQL feature + * Yadid Ayzenberg -- doc fix + * Yijie Shen -- buf fix + * Yin Huai -- JSON support and bug fixes + * Zhen Peng -- bug fix + * Zichuan Ye -- ec2 fixes + * Zongheng Yang -- sql fixes + +_Thanks to everyone who contributed!_ |