From d6ebe19825e676953342d069738f0026b985215d Mon Sep 17 00:00:00 2001
From: Matei Alexandru Zaharia
Date: Wed, 20 Jan 2016 21:40:26 +0000
Subject: Add TM symbol to download page
---
downloads.md | 2 +-
site/documentation.html | 5 ++-
site/downloads.html | 2 +-
site/news/index.html | 36 +++++++++++++++++++---
site/news/spark-0-9-1-released.html | 2 +-
site/news/spark-0-9-2-released.html | 2 +-
site/news/spark-1-1-0-released.html | 2 +-
site/news/spark-1-2-2-released.html | 2 +-
site/news/spark-and-shark-in-the-news.html | 2 +-
.../news/spark-summit-east-2015-videos-posted.html | 2 +-
site/releases/spark-release-0-8-0.html | 4 +--
site/releases/spark-release-0-9-1.html | 20 ++++++------
site/releases/spark-release-1-0-1.html | 8 ++---
site/releases/spark-release-1-0-2.html | 2 +-
site/releases/spark-release-1-1-0.html | 6 ++--
site/releases/spark-release-1-2-0.html | 2 +-
site/releases/spark-release-1-3-0.html | 6 ++--
site/releases/spark-release-1-3-1.html | 6 ++--
site/releases/spark-release-1-4-0.html | 4 +--
site/releases/spark-release-1-5-0.html | 30 +++++++++---------
site/releases/spark-release-1-6-0.html | 20 ++++++------
21 files changed, 95 insertions(+), 70 deletions(-)
diff --git a/downloads.md b/downloads.md
index 8e02da77d..6ba1e97a3 100644
--- a/downloads.md
+++ b/downloads.md
@@ -14,7 +14,7 @@ $(document).ready(function() {
});
-## Download Spark
+## Download Spark™
The latest release of Spark is Spark 1.6.0, released on January 4, 2016
(release notes)
diff --git a/site/documentation.html b/site/documentation.html
index efd32194b..1051b9d2c 100644
--- a/site/documentation.html
+++ b/site/documentation.html
@@ -226,13 +226,12 @@
Meetup Talk Videos
-In addition to the videos listed below, you can also view all slides from Bay Area meetups here.
+In addition to the videos listed below, you can also view all slides from Bay Area meetups here.
-
+
Thanks for sharing this, and looking forward to see others!
+
@@ -733,6 +753,7 @@ Over 450 Spark developers and enthusiasts from 13 countries and more than 180 co
December 21, 2012
On December 18th, we held the first of a series of Spark development meetups, for people interested in learning the Spark codebase and contributing to the project. There was quite a bit more demand than we anticipated, with over 80 people signing up and 64 attending. The first meetup was an introduction to Spark internals. Thanks to one of the attendees, there’s now a video of the meetup on YouTube. We’ve also posted the slides. Look to see more development meetups on Spark and Shark in the future.
+
@@ -751,7 +772,8 @@ Over 450 Spark developers and enthusiasts from 13 countries and more than 180 co
DataInformed interviewed two Spark users and wrote about their applications in anomaly detection, predictive analytics and data mining.
-In other news, there will be a full day of tutorials on Spark and Shark at the O’Reilly Strata conference in February. They include a three-hour introduction to Spark, Shark and BDAS Tuesday morning, and a three-hour hands-on exercise session.
+In other news, there will be a full day of tutorials on Spark and Shark at the O’Reilly Strata conference in February. They include a three-hour introduction to Spark, Shark and BDAS Tuesday morning, and a three-hour hands-on exercise session.
+
@@ -761,6 +783,7 @@ Over 450 Spark developers and enthusiasts from 13 countries and more than 180 co
November 22, 2012
Today we’ve made available two maintenance releases for Spark: 0.6.1 and 0.5.2. They both contain important bug fixes as well as some new features, such as the ability to build against Hadoop 2 distributions. We recommend that users update to the latest version for their branch; for new users, we recommend 0.6.1.
+
@@ -770,6 +793,7 @@ Over 450 Spark developers and enthusiasts from 13 countries and more than 180 co
October 15, 2012
Spark version 0.6.0 was released today, a major release that brings a wide range of performance improvements and new features, including a simpler standalone deploy mode and a Java API. Read more about it in the release notes.
+
@@ -779,6 +803,7 @@ Over 450 Spark developers and enthusiasts from 13 countries and more than 180 co
April 25, 2012
Our paper on Spark won the Best Paper Award at the USENIX NSDI conference. You can see a video of the talk, as well as slides, online on the NSDI website.
+
@@ -788,6 +813,7 @@ Over 450 Spark developers and enthusiasts from 13 countries and more than 180 co
January 10, 2012
We’ve started hosting a regular Bay Area Spark User Meetup. Sign up on the meetup.com page to be notified about events and meet other Spark developers and users.
+
diff --git a/site/news/spark-0-9-1-released.html b/site/news/spark-0-9-1-released.html
index 526b2a2de..ea635d20c 100644
--- a/site/news/spark-0-9-1-released.html
+++ b/site/news/spark-0-9-1-released.html
@@ -173,7 +173,7 @@
We are happy to announce the availability of
Spark 0.9.1! Apache Spark 0.9.1 is a maintenance release with bug fixes, performance improvements, better stability with YARN and
improved parity of the Scala and Python API. We recommend all 0.9.0 users to upgrade to this stable release.
-Contributions to this release came from 37 developers.
+Contributions to this release came from 37 developers.
Visit the release notes
to read about the new features, or download the release today.
diff --git a/site/news/spark-0-9-2-released.html b/site/news/spark-0-9-2-released.html
index a1d39349d..379e78c5e 100644
--- a/site/news/spark-0-9-2-released.html
+++ b/site/news/spark-0-9-2-released.html
@@ -172,7 +172,7 @@
We are happy to announce the availability of
Spark 0.9.2! Apache Spark 0.9.2 is a maintenance release with bug fixes. We recommend all 0.9.x users to upgrade to this stable release.
-Contributions to this release came from 28 developers.
+Contributions to this release came from 28 developers.
Visit the release notes
to read about the new features, or download the release today.
diff --git a/site/news/spark-1-1-0-released.html b/site/news/spark-1-1-0-released.html
index ea1211013..86d5ddd63 100644
--- a/site/news/spark-1-1-0-released.html
+++ b/site/news/spark-1-1-0-released.html
@@ -172,7 +172,7 @@
We are happy to announce the availability of Spark 1.1.0! Spark 1.1.0 is the second release on the API-compatible 1.X line. It is Spark’s largest release ever, with contributions from 171 developers!
-This release brings operational and performance improvements in Spark core including a new implementation of the Spark shuffle designed for very large scale workloads. Spark 1.1 adds significant extensions to the newest Spark modules, MLlib and Spark SQL. Spark SQL introduces a JDBC server, byte code generation for fast expression evaluation, a public types API, JSON support, and other features and optimizations. MLlib introduces a new statistics libary along with several new algorithms and optimizations. Spark 1.1 also builds out Spark’s Python support and adds new components to the Spark Streaming module.
+This release brings operational and performance improvements in Spark core including a new implementation of the Spark shuffle designed for very large scale workloads. Spark 1.1 adds significant extensions to the newest Spark modules, MLlib and Spark SQL. Spark SQL introduces a JDBC server, byte code generation for fast expression evaluation, a public types API, JSON support, and other features and optimizations. MLlib introduces a new statistics libary along with several new algorithms and optimizations. Spark 1.1 also builds out Spark’s Python support and adds new components to the Spark Streaming module.
Visit the release notes to read about the new features, or download the release today.
diff --git a/site/news/spark-1-2-2-released.html b/site/news/spark-1-2-2-released.html
index cec6ccab6..c64a7cac9 100644
--- a/site/news/spark-1-2-2-released.html
+++ b/site/news/spark-1-2-2-released.html
@@ -170,7 +170,7 @@
Spark 1.2.2 and 1.3.1 released
-We are happy to announce the availability of Spark 1.2.2 and Spark 1.3.1! These are both maintenance releases that collectively feature the work of more than 90 developers.
+We are happy to announce the availability of Spark 1.2.2 and Spark 1.3.1! These are both maintenance releases that collectively feature the work of more than 90 developers.
To download either release, visit the downloads page.
diff --git a/site/news/spark-and-shark-in-the-news.html b/site/news/spark-and-shark-in-the-news.html
index 1b72e219c..139074c46 100644
--- a/site/news/spark-and-shark-in-the-news.html
+++ b/site/news/spark-and-shark-in-the-news.html
@@ -180,7 +180,7 @@
DataInformed interviewed two Spark users and wrote about their applications in anomaly detection, predictive analytics and data mining.
-In other news, there will be a full day of tutorials on Spark and Shark at the O’Reilly Strata conference in February. They include a three-hour introduction to Spark, Shark and BDAS Tuesday morning, and a three-hour hands-on exercise session.
+In other news, there will be a full day of tutorials on Spark and Shark at the O’Reilly Strata conference in February. They include a three-hour introduction to Spark, Shark and BDAS Tuesday morning, and a three-hour hands-on exercise session.
diff --git a/site/news/spark-summit-east-2015-videos-posted.html b/site/news/spark-summit-east-2015-videos-posted.html
index 27e5ff19f..b2c93e64d 100644
--- a/site/news/spark-summit-east-2015-videos-posted.html
+++ b/site/news/spark-summit-east-2015-videos-posted.html
@@ -170,7 +170,7 @@
Spark Summit East 2015 Videos Posted
-The videos and slides for Spark Summit East 2015 are now all available online. Watch them to get the latest news from the Spark community as well as use cases and applications built on top.
+The videos and slides for Spark Summit East 2015 are now all available online. Watch them to get the latest news from the Spark community as well as use cases and applications built on top.
If you like what you see, consider joining us at the 2015 Spark Summit in San Francisco.
diff --git a/site/releases/spark-release-0-8-0.html b/site/releases/spark-release-0-8-0.html
index 51fd7045b..8e964b1a9 100644
--- a/site/releases/spark-release-0-8-0.html
+++ b/site/releases/spark-release-0-8-0.html
@@ -194,13 +194,13 @@
Spark’s internal job scheduler has been refactored and extended to include more sophisticated scheduling policies. In particular, a fair scheduler implementation now allows multiple users to share an instance of Spark, which helps users running shorter jobs to achieve good performance, even when longer-running jobs are running in parallel. Support for topology-aware scheduling has been extended, including the ability to take into account rack locality and support for multiple executors on a single machine.
Easier Deployment and Linking
-User programs can now link to Spark no matter which Hadoop version they need, without having to publish a version of spark-core
specifically for that Hadoop version. An explanation of how to link against different Hadoop versions is provided here.
+User programs can now link to Spark no matter which Hadoop version they need, without having to publish a version of spark-core
specifically for that Hadoop version. An explanation of how to link against different Hadoop versions is provided here.
Expanded EC2 Capabilities
Spark’s EC2 scripts now support launching in any availability zone. Support has also been added for EC2 instance types which use the newer “HVM” architecture. This includes the cluster compute (cc1/cc2) family of instance types. We’ve also added support for running newer versions of HDFS alongside Spark. Finally, we’ve added the ability to launch clusters with maintenance releases of Spark in addition to launching the newest release.
Improved Documentation
-This release adds documentation about cluster hardware provisioning and inter-operation with common Hadoop distributions. Docs are also included to cover the MLlib machine learning functions and new cluster monitoring features. Existing documentation has been updated to reflect changes in building and deploying Spark.
+This release adds documentation about cluster hardware provisioning and inter-operation with common Hadoop distributions. Docs are also included to cover the MLlib machine learning functions and new cluster monitoring features. Existing documentation has been updated to reflect changes in building and deploying Spark.
Other Improvements
diff --git a/site/releases/spark-release-0-9-1.html b/site/releases/spark-release-0-9-1.html
index 9f0e28139..656d383a4 100644
--- a/site/releases/spark-release-0-9-1.html
+++ b/site/releases/spark-release-0-9-1.html
@@ -185,9 +185,9 @@
- Fixed hash collision bug in external spilling [SPARK-1113]
- Fixed conflict with Spark’s log4j for users relying on other logging backends [SPARK-1190]
- Fixed Graphx missing from Spark assembly jar in maven builds
- - Fixed silent failures due to map output status exceeding Akka frame size [SPARK-1244]
- - Removed Spark’s unnecessary direct dependency on ASM [SPARK-782]
- - Removed metrics-ganglia from default build due to LGPL license conflict [SPARK-1167]
+ - Fixed silent failures due to map output status exceeding Akka frame size [SPARK-1244]
+ - Removed Spark’s unnecessary direct dependency on ASM [SPARK-782]
+ - Removed metrics-ganglia from default build due to LGPL license conflict [SPARK-1167]
- Fixed bug in distribution tarball not containing spark assembly jar [SPARK-1184]
- Fixed bug causing infinite NullPointerException failures due to a null in map output locations [SPARK-1124]
- Fixed bugs in post-job cleanup of scheduler’s data structures
@@ -203,7 +203,7 @@
- Fixed bug making Spark application stall when YARN registration fails [SPARK-1032]
- Race condition in getting HDFS delegation tokens in yarn-client mode [SPARK-1203]
- Fixed bug in yarn-client mode not exiting properly [SPARK-1049]
- - Fixed regression bug in ADD_JAR environment variable not correctly adding custom jars [SPARK-1089]
+ - Fixed regression bug in ADD_JAR environment variable not correctly adding custom jars [SPARK-1089]
Improvements to other deployment scenarios
@@ -214,19 +214,19 @@
Optimizations to MLLib
- - Optimized memory usage of ALS [MLLIB-25]
+ - Optimized memory usage of ALS [MLLIB-25]
- Optimized computation of YtY for implicit ALS [SPARK-1237]
- Support for negative implicit input in ALS [MLLIB-22]
- Setting of a random seed in ALS [SPARK-1238]
- - Faster construction of features with intercept [SPARK-1260]
+ - Faster construction of features with intercept [SPARK-1260]
- Check for intercept and weight in GLM’s addIntercept [SPARK-1327]
Bug fixes and better API parity for PySpark
- Fixed bug in Python de-pickling [SPARK-1135]
- - Fixed bug in serialization of strings longer than 64K [SPARK-1043]
- - Fixed bug that made jobs hang when base file is not available [SPARK-1025]
+ - Fixed bug in serialization of strings longer than 64K [SPARK-1043]
+ - Fixed bug that made jobs hang when base file is not available [SPARK-1025]
- Added Missing RDD operations to PySpark - top, zip, foldByKey, repartition, coalesce, getStorageLevel, setName and toDebugString
@@ -258,13 +258,13 @@
Kay Ousterhout - Multiple bug fixes in scheduler’s handling of task failures
Kousuke Saruta - Use of https to access github
Mark Grover - Bug fix in distribution tar.gz
- Matei Zaharia - Bug fixes in handling of task failures due to NPE, and cleaning up of scheduler data structures
+ Matei Zaharia - Bug fixes in handling of task failures due to NPE, and cleaning up of scheduler data structures
Nan Zhu - Bug fixes in PySpark RDD.takeSample and adding of JARs using ADD_JAR - and improvements to docs
Nick Lanham - Added ability to make distribution tarballs with Tachyon
Patrick Wendell - Bug fixes in ASM shading, fixes for log4j initialization, removing Ganglia due to LGPL license, and other miscallenous bug fixes
Prabin Banka - RDD.zip and other missing RDD operations in PySpark
Prashant Sharma - RDD.foldByKey in PySpark, and other PySpark doc improvements
- Qiuzhuang - Bug fix in standalone worker
+ Qiuzhuang - Bug fix in standalone worker
Raymond Liu - Changed working directory in ZookeeperPersistenceEngine
Reynold Xin - Improvements to docs and test infrastructure
Sandy Ryza - Multiple important Yarn bug fixes and improvements
diff --git a/site/releases/spark-release-1-0-1.html b/site/releases/spark-release-1-0-1.html
index 2090e1235..a15b5e02e 100644
--- a/site/releases/spark-release-1-0-1.html
+++ b/site/releases/spark-release-1-0-1.html
@@ -242,8 +242,8 @@
Cheng Hao – SQL features
Cheng Lian – SQL features
Christian Tzolov – build improvmenet
- Clément MATHIEU – doc updates
- CodingCat – doc updates and bug fix
+ Clément MATHIEU – doc updates
+ CodingCat – doc updates and bug fix
Colin McCabe – bug fix
Daoyuan – SQL joins
David Lemieux – bug fix
@@ -259,7 +259,7 @@
Kan Zhang – PySpark SQL features
Kay Ousterhout – documentation fix
LY Lai – bug fix
- Lars Albertsson – bug fix
+ Lars Albertsson – bug fix
Lei Zhang – SQL fix and feature
Mark Hamstra – bug fix
Matei Zaharia – doc updates and bug fix
@@ -281,7 +281,7 @@
Shixiong Zhu – code clean-up
Szul, Piotr – bug fix
Takuya UESHIN – bug fixes and SQL features
- Thomas Graves – bug fix
+ Thomas Graves – bug fix
Uri Laserson – bug fix
Vadim Chekan – bug fix
Varakhedi Sujeet – ec2 r3 support
diff --git a/site/releases/spark-release-1-0-2.html b/site/releases/spark-release-1-0-2.html
index 7997c5964..ffcf953e5 100644
--- a/site/releases/spark-release-1-0-2.html
+++ b/site/releases/spark-release-1-0-2.html
@@ -252,7 +252,7 @@
johnnywalleye - Bug fixes in MLlib
joyyoj - Bug fix in Streaming
kballou - Doc fix
- lianhuiwang - Doc fix
+ lianhuiwang - Doc fix
witgo - Bug fix in sbt
diff --git a/site/releases/spark-release-1-1-0.html b/site/releases/spark-release-1-1-0.html
index 4ed53dbe2..3776fea79 100644
--- a/site/releases/spark-release-1-1-0.html
+++ b/site/releases/spark-release-1-1-0.html
@@ -181,7 +181,7 @@
Spark SQL adds a number of new features and performance improvements in this release. A JDBC/ODBC server allows users to connect to SparkSQL from many different applications and provides shared access to cached tables. A new module provides support for loading JSON data directly into Spark’s SchemaRDD format, including automatic schema inference. Spark SQL introduces dynamic bytecode generation in this release, a technique which significantly speeds up execution for queries that perform complex expression evaluation. This release also adds support for registering Python, Scala, and Java lambda functions as UDFs, which can then be called directly in SQL. Spark 1.1 adds a public types API to allow users to create SchemaRDD’s from custom data sources. Finally, many optimizations have been added to the native Parquet support as well as throughout the engine.
MLlib
-MLlib adds several new algorithms and optimizations in this release. 1.1 introduces a new library of statistical packages which provides exploratory analytic functions. These include stratified sampling, correlations, chi-squared tests and support for creating random datasets. This release adds utilities for feature extraction (Word2Vec and TF-IDF) and feature transformation (normalization and standard scaling). Also new are support for nonnegative matrix factorization and SVD via Lanczos. The decision tree algorithm has been added in Python and Java. A tree aggregation primitive has been added to help optimize many existing algorithms. Performance improves across the board in MLlib 1.1, with improvements of around 2-3X for many algorithms and up to 5X for large scale decision tree problems.
+MLlib adds several new algorithms and optimizations in this release. 1.1 introduces a new library of statistical packages which provides exploratory analytic functions. These include stratified sampling, correlations, chi-squared tests and support for creating random datasets. This release adds utilities for feature extraction (Word2Vec and TF-IDF) and feature transformation (normalization and standard scaling). Also new are support for nonnegative matrix factorization and SVD via Lanczos. The decision tree algorithm has been added in Python and Java. A tree aggregation primitive has been added to help optimize many existing algorithms. Performance improves across the board in MLlib 1.1, with improvements of around 2-3X for many algorithms and up to 5X for large scale decision tree problems.
GraphX and Spark Streaming
Spark streaming adds a new data source Amazon Kinesis. For the Apache Flume, a new mode is supported which pulls data from Flume, simplifying deployment and providing high availability. The first of a set of streaming machine learning algorithms is introduced with streaming linear regression. Finally, rate limiting has been added for streaming inputs. GraphX adds custom storage levels for vertices and edges along with improved numerical precision across the board. Finally, GraphX adds a new label propagation algorithm.
@@ -199,7 +199,7 @@
- The default value of
spark.io.compression.codec
is now snappy
for improved memory usage. Old behavior can be restored by switching to lzf
.
- - The default value of
spark.broadcast.factory
is now org.apache.spark.broadcast.TorrentBroadcastFactory
for improved efficiency of broadcasts. Old behavior can be restored by switching to org.apache.spark.broadcast.HttpBroadcastFactory
.
+ - The default value of
spark.broadcast.factory
is now org.apache.spark.broadcast.TorrentBroadcastFactory
for improved efficiency of broadcasts. Old behavior can be restored by switching to org.apache.spark.broadcast.HttpBroadcastFactory
.
- PySpark now performs external spilling during aggregations. Old behavior can be restored by setting
spark.shuffle.spill
to false
.
- PySpark uses a new heuristic for determining the parallelism of shuffle operations. Old behavior can be restored by setting
spark.default.parallelism
to the number of cores in the cluster.
@@ -259,7 +259,7 @@
Daneil Darabos – bug fixes and UI enhancements
Daoyuan Wang – SQL fixes
David Lemieux – bug fix
- Davies Liu – PySpark fixes and spilling
+ Davies Liu – PySpark fixes and spilling
DB Tsai – online summaries in MLlib and other MLlib features
Derek Ma – bug fix
Doris Xin – MLlib stats library and several fixes
diff --git a/site/releases/spark-release-1-2-0.html b/site/releases/spark-release-1-2-0.html
index ea3500392..82bd94450 100644
--- a/site/releases/spark-release-1-2-0.html
+++ b/site/releases/spark-release-1-2-0.html
@@ -178,7 +178,7 @@
In 1.2 Spark core upgrades two major subsystems to improve the performance and stability of very large scale shuffles. The first is Spark’s communication manager used during bulk transfers, which upgrades to a netty-based implementation. The second is Spark’s shuffle mechanism, which upgrades to the “sort based” shuffle initially released in Spark 1.1. These both improve the performance and stability of very large scale shuffles. Spark also adds an elastic scaling mechanism designed to improve cluster utilization during long running ETL-style jobs. This is currently supported on YARN and will make its way to other cluster managers in future versions. Finally, Spark 1.2 adds support for Scala 2.11. For instructions on building for Scala 2.11 see the build documentation.
Spark Streaming
-This release includes two major feature additions to Spark’s streaming library, a Python API and a write ahead log for full driver H/A. The Python API covers almost all the DStream transformations and output operations. Input sources based on text files and text over sockets are currently supported. Support for Kafka and Flume input streams in Python will be added in the next release. Second, Spark streaming now features H/A driver support through a write ahead log (WAL). In Spark 1.1 and earlier, some buffered (received but not yet processed) data can be lost during driver restarts. To prevent this Spark 1.2 adds an optional WAL, which buffers received data into a fault-tolerant file system (e.g. HDFS). See the streaming programming guide for more details.
+This release includes two major feature additions to Spark’s streaming library, a Python API and a write ahead log for full driver H/A. The Python API covers almost all the DStream transformations and output operations. Input sources based on text files and text over sockets are currently supported. Support for Kafka and Flume input streams in Python will be added in the next release. Second, Spark streaming now features H/A driver support through a write ahead log (WAL). In Spark 1.1 and earlier, some buffered (received but not yet processed) data can be lost during driver restarts. To prevent this Spark 1.2 adds an optional WAL, which buffers received data into a fault-tolerant file system (e.g. HDFS). See the streaming programming guide for more details.
MLLib
Spark 1.2 previews a new set of machine learning API’s in a package called spark.ml that supports learning pipelines, where multiple algorithms are run in sequence with varying parameters. This type of pipeline is common in practical machine learning deployments. The new ML package uses Spark’s SchemaRDD to represent ML datasets, providing direct interoperability with Spark SQL. In addition to the new API, Spark 1.2 extends decision trees with two tree ensemble methods: random forests and gradient-boosted trees, among the most successful tree-based models for classification and regression. Finally, MLlib’s Python implementation receives a major update in 1.2 to simplify the process of adding Python APIs, along with better Python API coverage.
diff --git a/site/releases/spark-release-1-3-0.html b/site/releases/spark-release-1-3-0.html
index d4e615a33..b8cffe3cf 100644
--- a/site/releases/spark-release-1-3-0.html
+++ b/site/releases/spark-release-1-3-0.html
@@ -175,7 +175,7 @@
To download Spark 1.3 visit the downloads page.
Spark Core
-Spark 1.3 sees a handful of usability improvements in the core engine. The core API now supports multi level aggregation trees to help speed up expensive reduce operations. Improved error reporting has been added for certain gotcha operations. Spark’s Jetty dependency is now shaded to help avoid conflicts with user programs. Spark now supports SSL encryption for some communication endpoints. Finaly, realtime GC metrics and record counts have been added to the UI.
+Spark 1.3 sees a handful of usability improvements in the core engine. The core API now supports multi level aggregation trees to help speed up expensive reduce operations. Improved error reporting has been added for certain gotcha operations. Spark’s Jetty dependency is now shaded to help avoid conflicts with user programs. Spark now supports SSL encryption for some communication endpoints. Finaly, realtime GC metrics and record counts have been added to the UI.
DataFrame API
Spark 1.3 adds a new DataFrames API that provides powerful and convenient operators when working with structured datasets. The DataFrame is an evolution of the base RDD API that includes named fields along with schema information. It’s easy to construct a DataFrame from sources such as Hive tables, JSON data, a JDBC database, or any implementation of Spark’s new data source API. Data frames will become a common interchange format between Spark components and when importing and exporting data to other systems. Data frames are supported in Python, Scala, and Java.
@@ -187,7 +187,7 @@
In this release Spark MLlib introduces several new algorithms: latent Dirichlet allocation (LDA) for topic modeling, multinomial logistic regression for multiclass classification, Gaussian mixture model (GMM) and power iteration clustering for clustering, FP-growth for frequent pattern mining, and block matrix abstraction for distributed linear algebra. Initial support has been added for model import/export in exchangeable format, which will be expanded in future versions to cover more model types in Java/Python/Scala. The implementations of k-means and ALS receive updates that lead to significant performance gain. PySpark now supports the ML pipeline API added in Spark 1.2, and gradient boosted trees and Gaussian mixture model. Finally, the ML pipeline API has been ported to support the new DataFrames abstraction.
Spark Streaming
-Spark 1.3 introduces a new direct Kafka API (docs) which enables exactly-once delivery without the use of write ahead logs. It also adds a Python Kafka API along with infrastructure for additional Python API’s in future releases. An online version of logistic regression and the ability to read binary records have also been added. For stateful operations, support has been added for loading of an initial state RDD. Finally, the streaming programming guide has been updated to include information about SQL and DataFrame operations within streaming applications, and important clarifications to the fault-tolerance semantics.
+Spark 1.3 introduces a new direct Kafka API (docs) which enables exactly-once delivery without the use of write ahead logs. It also adds a Python Kafka API along with infrastructure for additional Python API’s in future releases. An online version of logistic regression and the ability to read binary records have also been added. For stateful operations, support has been added for loading of an initial state RDD. Finally, the streaming programming guide has been updated to include information about SQL and DataFrame operations within streaming applications, and important clarifications to the fault-tolerance semantics.
GraphX
GraphX adds a handful of utility functions in this release, including conversion into a canonical edge graph.
@@ -203,7 +203,7 @@
- SPARK-6194: A memory leak in PySPark’s
collect()
.
- SPARK-6222: An issue with failure recovery in Spark Streaming.
- - SPARK-6315: Spark SQL can’t read parquet data generated with Spark 1.1.
+ - SPARK-6315: Spark SQL can’t read parquet data generated with Spark 1.1.
- SPARK-6247: Errors analyzing certain join types in Spark SQL.
diff --git a/site/releases/spark-release-1-3-1.html b/site/releases/spark-release-1-3-1.html
index 5f0aee9bd..feb9bb1f2 100644
--- a/site/releases/spark-release-1-3-1.html
+++ b/site/releases/spark-release-1-3-1.html
@@ -180,10 +180,10 @@
Spark SQL
- Unable to use reserved words in DDL (SPARK-6250)
- - Parquet no longer caches metadata (SPARK-6575)
+ - Parquet no longer caches metadata (SPARK-6575)
- Bug when joining two Parquet tables (SPARK-6851)
- - Unable to read parquet data generated by Spark 1.1.1 (SPARK-6315)
- - Parquet data source may use wrong Hadoop FileSystem (SPARK-6330)
+ - Unable to read parquet data generated by Spark 1.1.1 (SPARK-6315)
+ - Parquet data source may use wrong Hadoop FileSystem (SPARK-6330)
Spark Streaming
diff --git a/site/releases/spark-release-1-4-0.html b/site/releases/spark-release-1-4-0.html
index d7675bfe3..f1a17be9e 100644
--- a/site/releases/spark-release-1-4-0.html
+++ b/site/releases/spark-release-1-4-0.html
@@ -234,7 +234,7 @@ Python coverage. MLlib also adds several new algorithms.
Spark Streaming
-Spark streaming adds visual instrumentation graphs and significantly improved debugging information in the UI. It also enhances support for both Kafka and Kinesis.
+Spark streaming adds visual instrumentation graphs and significantly improved debugging information in the UI. It also enhances support for both Kafka and Kinesis.
- SPARK-7602: Visualization and monitoring in the streaming UI including batch drill down (SPARK-6796, SPARK-6862)
@@ -260,7 +260,7 @@ Python coverage. MLlib also adds several new algorithms.
Test Partners
-Thanks to The following organizations, who helped benchmark or integration test release candidates:
Intel, Palantir, Cloudera, Mesosphere, Huawei, Shopify, Netflix, Yahoo, UC Berkeley and Databricks.
+Thanks to The following organizations, who helped benchmark or integration test release candidates:
Intel, Palantir, Cloudera, Mesosphere, Huawei, Shopify, Netflix, Yahoo, UC Berkeley and Databricks.
Contributors