From 9700f2f4afe566412bdb73b443b3aad99b375af1 Mon Sep 17 00:00:00 2001 From: Matei Zaharia Date: Wed, 3 Aug 2016 19:12:55 -0700 Subject: Trademarks page and some FAQ cleanup --- site/releases/spark-release-0-3.html | 2 +- site/releases/spark-release-0-5-0.html | 2 +- site/releases/spark-release-0-5-1.html | 2 +- site/releases/spark-release-0-5-2.html | 2 +- site/releases/spark-release-0-6-0.html | 2 +- site/releases/spark-release-0-6-1.html | 2 +- site/releases/spark-release-0-6-2.html | 2 +- site/releases/spark-release-0-7-0.html | 2 +- site/releases/spark-release-0-7-2.html | 2 +- site/releases/spark-release-0-7-3.html | 2 +- site/releases/spark-release-0-8-0.html | 6 +++--- site/releases/spark-release-0-8-1.html | 2 +- site/releases/spark-release-0-9-0.html | 2 +- site/releases/spark-release-0-9-1.html | 22 ++++++++++---------- site/releases/spark-release-0-9-2.html | 2 +- site/releases/spark-release-1-0-0.html | 2 +- site/releases/spark-release-1-0-1.html | 10 ++++----- site/releases/spark-release-1-0-2.html | 4 ++-- site/releases/spark-release-1-1-0.html | 8 +++---- site/releases/spark-release-1-1-1.html | 2 +- site/releases/spark-release-1-2-0.html | 4 ++-- site/releases/spark-release-1-2-1.html | 2 +- site/releases/spark-release-1-2-2.html | 2 +- site/releases/spark-release-1-3-0.html | 8 +++---- site/releases/spark-release-1-3-1.html | 8 +++---- site/releases/spark-release-1-4-0.html | 6 +++--- site/releases/spark-release-1-4-1.html | 2 +- site/releases/spark-release-1-5-0.html | 32 ++++++++++++++-------------- site/releases/spark-release-1-5-1.html | 2 +- site/releases/spark-release-1-5-2.html | 2 +- site/releases/spark-release-1-6-0.html | 22 ++++++++++---------- site/releases/spark-release-1-6-1.html | 2 +- site/releases/spark-release-1-6-2.html | 2 +- site/releases/spark-release-2-0-0.html | 38 +++++++++++++++++----------------- 34 files changed, 106 insertions(+), 106 deletions(-) (limited to 'site/releases') diff --git a/site/releases/spark-release-0-3.html b/site/releases/spark-release-0-3.html index f09bf6b51..add3e10e5 100644 --- a/site/releases/spark-release-0-3.html +++ b/site/releases/spark-release-0-3.html @@ -252,7 +252,7 @@ squares.saveAsSequenceFile("hdfs://...") diff --git a/site/releases/spark-release-0-5-0.html b/site/releases/spark-release-0-5-0.html index 2d8d27f3d..54500a86d 100644 --- a/site/releases/spark-release-0-5-0.html +++ b/site/releases/spark-release-0-5-0.html @@ -224,7 +224,7 @@ diff --git a/site/releases/spark-release-0-5-1.html b/site/releases/spark-release-0-5-1.html index 03f38ce87..f9e239224 100644 --- a/site/releases/spark-release-0-5-1.html +++ b/site/releases/spark-release-0-5-1.html @@ -234,7 +234,7 @@ diff --git a/site/releases/spark-release-0-5-2.html b/site/releases/spark-release-0-5-2.html index 5709680f8..33da64481 100644 --- a/site/releases/spark-release-0-5-2.html +++ b/site/releases/spark-release-0-5-2.html @@ -203,7 +203,7 @@ diff --git a/site/releases/spark-release-0-6-0.html b/site/releases/spark-release-0-6-0.html index 6636ef1fd..eedd8a813 100644 --- a/site/releases/spark-release-0-6-0.html +++ b/site/releases/spark-release-0-6-0.html @@ -278,7 +278,7 @@ diff --git a/site/releases/spark-release-0-6-1.html b/site/releases/spark-release-0-6-1.html index 3861c9e6b..4804b93c7 100644 --- a/site/releases/spark-release-0-6-1.html +++ b/site/releases/spark-release-0-6-1.html @@ -218,7 +218,7 @@ diff --git a/site/releases/spark-release-0-6-2.html b/site/releases/spark-release-0-6-2.html index c5c0bd9d8..b811390a2 100644 --- a/site/releases/spark-release-0-6-2.html +++ b/site/releases/spark-release-0-6-2.html @@ -231,7 +231,7 @@ diff --git a/site/releases/spark-release-0-7-0.html b/site/releases/spark-release-0-7-0.html index 707e3adb2..580e8fc73 100644 --- a/site/releases/spark-release-0-7-0.html +++ b/site/releases/spark-release-0-7-0.html @@ -300,7 +300,7 @@ diff --git a/site/releases/spark-release-0-7-2.html b/site/releases/spark-release-0-7-2.html index a534c721f..a565741d1 100644 --- a/site/releases/spark-release-0-7-2.html +++ b/site/releases/spark-release-0-7-2.html @@ -242,7 +242,7 @@ diff --git a/site/releases/spark-release-0-7-3.html b/site/releases/spark-release-0-7-3.html index 884aa6fd4..2d6ae7307 100644 --- a/site/releases/spark-release-0-7-3.html +++ b/site/releases/spark-release-0-7-3.html @@ -236,7 +236,7 @@ diff --git a/site/releases/spark-release-0-8-0.html b/site/releases/spark-release-0-8-0.html index b33e71f80..9815cfea9 100644 --- a/site/releases/spark-release-0-8-0.html +++ b/site/releases/spark-release-0-8-0.html @@ -210,13 +210,13 @@

Spark’s internal job scheduler has been refactored and extended to include more sophisticated scheduling policies. In particular, a fair scheduler implementation now allows multiple users to share an instance of Spark, which helps users running shorter jobs to achieve good performance, even when longer-running jobs are running in parallel. Support for topology-aware scheduling has been extended, including the ability to take into account rack locality and support for multiple executors on a single machine.

Easier Deployment and Linking

-

User programs can now link to Spark no matter which Hadoop version they need, without having to publish a version of spark-core specifically for that Hadoop version. An explanation of how to link against different Hadoop versions is provided here.

+

User programs can now link to Spark no matter which Hadoop version they need, without having to publish a version of spark-core specifically for that Hadoop version. An explanation of how to link against different Hadoop versions is provided here.

Expanded EC2 Capabilities

Spark’s EC2 scripts now support launching in any availability zone. Support has also been added for EC2 instance types which use the newer “HVM” architecture. This includes the cluster compute (cc1/cc2) family of instance types. We’ve also added support for running newer versions of HDFS alongside Spark. Finally, we’ve added the ability to launch clusters with maintenance releases of Spark in addition to launching the newest release.

Improved Documentation

-

This release adds documentation about cluster hardware provisioning and inter-operation with common Hadoop distributions. Docs are also included to cover the MLlib machine learning functions and new cluster monitoring features. Existing documentation has been updated to reflect changes in building and deploying Spark.

+

This release adds documentation about cluster hardware provisioning and inter-operation with common Hadoop distributions. Docs are also included to cover the MLlib machine learning functions and new cluster monitoring features. Existing documentation has been updated to reflect changes in building and deploying Spark.

Other Improvements

Improvements to other deployment scenarios

@@ -230,19 +230,19 @@

Optimizations to MLLib

Bug fixes and better API parity for PySpark

@@ -274,13 +274,13 @@
  • Kay Ousterhout - Multiple bug fixes in scheduler’s handling of task failures
  • Kousuke Saruta - Use of https to access github
  • Mark Grover - Bug fix in distribution tar.gz
  • -
  • Matei Zaharia - Bug fixes in handling of task failures due to NPE, and cleaning up of scheduler data structures
  • +
  • Matei Zaharia - Bug fixes in handling of task failures due to NPE, and cleaning up of scheduler data structures
  • Nan Zhu - Bug fixes in PySpark RDD.takeSample and adding of JARs using ADD_JAR - and improvements to docs
  • Nick Lanham - Added ability to make distribution tarballs with Tachyon
  • Patrick Wendell - Bug fixes in ASM shading, fixes for log4j initialization, removing Ganglia due to LGPL license, and other miscallenous bug fixes
  • Prabin Banka - RDD.zip and other missing RDD operations in PySpark
  • Prashant Sharma - RDD.foldByKey in PySpark, and other PySpark doc improvements
  • -
  • Qiuzhuang - Bug fix in standalone worker
  • +
  • Qiuzhuang - Bug fix in standalone worker
  • Raymond Liu - Changed working directory in ZookeeperPersistenceEngine
  • Reynold Xin - Improvements to docs and test infrastructure
  • Sandy Ryza - Multiple important Yarn bug fixes and improvements
  • @@ -307,7 +307,7 @@ diff --git a/site/releases/spark-release-0-9-2.html b/site/releases/spark-release-0-9-2.html index 7b4a37862..ba99826f8 100644 --- a/site/releases/spark-release-0-9-2.html +++ b/site/releases/spark-release-0-9-2.html @@ -280,7 +280,7 @@ diff --git a/site/releases/spark-release-1-0-0.html b/site/releases/spark-release-1-0-0.html index da2c5c899..18447f949 100644 --- a/site/releases/spark-release-1-0-0.html +++ b/site/releases/spark-release-1-0-0.html @@ -373,7 +373,7 @@ diff --git a/site/releases/spark-release-1-0-1.html b/site/releases/spark-release-1-0-1.html index 66b1dbfc2..22905b65a 100644 --- a/site/releases/spark-release-1-0-1.html +++ b/site/releases/spark-release-1-0-1.html @@ -258,8 +258,8 @@
  • Cheng Hao – SQL features
  • Cheng Lian – SQL features
  • Christian Tzolov – build improvmenet
  • -
  • Clément MATHIEU – doc updates
  • -
  • CodingCat – doc updates and bug fix
  • +
  • Clément MATHIEU – doc updates
  • +
  • CodingCat – doc updates and bug fix
  • Colin McCabe – bug fix
  • Daoyuan – SQL joins
  • David Lemieux – bug fix
  • @@ -275,7 +275,7 @@
  • Kan Zhang – PySpark SQL features
  • Kay Ousterhout – documentation fix
  • LY Lai – bug fix
  • -
  • Lars Albertsson – bug fix
  • +
  • Lars Albertsson – bug fix
  • Lei Zhang – SQL fix and feature
  • Mark Hamstra – bug fix
  • Matei Zaharia – doc updates and bug fix
  • @@ -297,7 +297,7 @@
  • Shixiong Zhu – code clean-up
  • Szul, Piotr – bug fix
  • Takuya UESHIN – bug fixes and SQL features
  • -
  • Thomas Graves – bug fix
  • +
  • Thomas Graves – bug fix
  • Uri Laserson – bug fix
  • Vadim Chekan – bug fix
  • Varakhedi Sujeet – ec2 r3 support
  • @@ -331,7 +331,7 @@ diff --git a/site/releases/spark-release-1-0-2.html b/site/releases/spark-release-1-0-2.html index 8a6470f9b..ae5916ab1 100644 --- a/site/releases/spark-release-1-0-2.html +++ b/site/releases/spark-release-1-0-2.html @@ -268,7 +268,7 @@
  • johnnywalleye - Bug fixes in MLlib
  • joyyoj - Bug fix in Streaming
  • kballou - Doc fix
  • -
  • lianhuiwang - Doc fix
  • +
  • lianhuiwang - Doc fix
  • witgo - Bug fix in sbt
  • @@ -288,7 +288,7 @@ diff --git a/site/releases/spark-release-1-1-0.html b/site/releases/spark-release-1-1-0.html index f2d1a6737..7f8812649 100644 --- a/site/releases/spark-release-1-1-0.html +++ b/site/releases/spark-release-1-1-0.html @@ -197,7 +197,7 @@

    Spark SQL adds a number of new features and performance improvements in this release. A JDBC/ODBC server allows users to connect to SparkSQL from many different applications and provides shared access to cached tables. A new module provides support for loading JSON data directly into Spark’s SchemaRDD format, including automatic schema inference. Spark SQL introduces dynamic bytecode generation in this release, a technique which significantly speeds up execution for queries that perform complex expression evaluation. This release also adds support for registering Python, Scala, and Java lambda functions as UDFs, which can then be called directly in SQL. Spark 1.1 adds a public types API to allow users to create SchemaRDD’s from custom data sources. Finally, many optimizations have been added to the native Parquet support as well as throughout the engine.

    MLlib

    -

    MLlib adds several new algorithms and optimizations in this release. 1.1 introduces a new library of statistical packages which provides exploratory analytic functions. These include stratified sampling, correlations, chi-squared tests and support for creating random datasets. This release adds utilities for feature extraction (Word2Vec and TF-IDF) and feature transformation (normalization and standard scaling). Also new are support for nonnegative matrix factorization and SVD via Lanczos. The decision tree algorithm has been added in Python and Java. A tree aggregation primitive has been added to help optimize many existing algorithms. Performance improves across the board in MLlib 1.1, with improvements of around 2-3X for many algorithms and up to 5X for large scale decision tree problems.

    +

    MLlib adds several new algorithms and optimizations in this release. 1.1 introduces a new library of statistical packages which provides exploratory analytic functions. These include stratified sampling, correlations, chi-squared tests and support for creating random datasets. This release adds utilities for feature extraction (Word2Vec and TF-IDF) and feature transformation (normalization and standard scaling). Also new are support for nonnegative matrix factorization and SVD via Lanczos. The decision tree algorithm has been added in Python and Java. A tree aggregation primitive has been added to help optimize many existing algorithms. Performance improves across the board in MLlib 1.1, with improvements of around 2-3X for many algorithms and up to 5X for large scale decision tree problems.

    GraphX and Spark Streaming

    Spark streaming adds a new data source Amazon Kinesis. For the Apache Flume, a new mode is supported which pulls data from Flume, simplifying deployment and providing high availability. The first of a set of streaming machine learning algorithms is introduced with streaming linear regression. Finally, rate limiting has been added for streaming inputs. GraphX adds custom storage levels for vertices and edges along with improved numerical precision across the board. Finally, GraphX adds a new label propagation algorithm.

    @@ -215,7 +215,7 @@ @@ -275,7 +275,7 @@
  • Daneil Darabos – bug fixes and UI enhancements
  • Daoyuan Wang – SQL fixes
  • David Lemieux – bug fix
  • -
  • Davies Liu – PySpark fixes and spilling
  • +
  • Davies Liu – PySpark fixes and spilling
  • DB Tsai – online summaries in MLlib and other MLlib features
  • Derek Ma – bug fix
  • Doris Xin – MLlib stats library and several fixes
  • @@ -424,7 +424,7 @@ diff --git a/site/releases/spark-release-1-1-1.html b/site/releases/spark-release-1-1-1.html index f9291a3b2..b2da1f11f 100644 --- a/site/releases/spark-release-1-1-1.html +++ b/site/releases/spark-release-1-1-1.html @@ -310,7 +310,7 @@ diff --git a/site/releases/spark-release-1-2-0.html b/site/releases/spark-release-1-2-0.html index 020984f9c..344be74b6 100644 --- a/site/releases/spark-release-1-2-0.html +++ b/site/releases/spark-release-1-2-0.html @@ -194,7 +194,7 @@

    In 1.2 Spark core upgrades two major subsystems to improve the performance and stability of very large scale shuffles. The first is Spark’s communication manager used during bulk transfers, which upgrades to a netty-based implementation. The second is Spark’s shuffle mechanism, which upgrades to the “sort based” shuffle initially released in Spark 1.1. These both improve the performance and stability of very large scale shuffles. Spark also adds an elastic scaling mechanism designed to improve cluster utilization during long running ETL-style jobs. This is currently supported on YARN and will make its way to other cluster managers in future versions. Finally, Spark 1.2 adds support for Scala 2.11. For instructions on building for Scala 2.11 see the build documentation.

    Spark Streaming

    -

    This release includes two major feature additions to Spark’s streaming library, a Python API and a write ahead log for full driver H/A. The Python API covers almost all the DStream transformations and output operations. Input sources based on text files and text over sockets are currently supported. Support for Kafka and Flume input streams in Python will be added in the next release. Second, Spark streaming now features H/A driver support through a write ahead log (WAL). In Spark 1.1 and earlier, some buffered (received but not yet processed) data can be lost during driver restarts. To prevent this Spark 1.2 adds an optional WAL, which buffers received data into a fault-tolerant file system (e.g. HDFS). See the streaming programming guide for more details.

    +

    This release includes two major feature additions to Spark’s streaming library, a Python API and a write ahead log for full driver H/A. The Python API covers almost all the DStream transformations and output operations. Input sources based on text files and text over sockets are currently supported. Support for Kafka and Flume input streams in Python will be added in the next release. Second, Spark streaming now features H/A driver support through a write ahead log (WAL). In Spark 1.1 and earlier, some buffered (received but not yet processed) data can be lost during driver restarts. To prevent this Spark 1.2 adds an optional WAL, which buffers received data into a fault-tolerant file system (e.g. HDFS). See the streaming programming guide for more details.

    MLLib

    Spark 1.2 previews a new set of machine learning API’s in a package called spark.ml that supports learning pipelines, where multiple algorithms are run in sequence with varying parameters. This type of pipeline is common in practical machine learning deployments. The new ML package uses Spark’s SchemaRDD to represent ML datasets, providing direct interoperability with Spark SQL. In addition to the new API, Spark 1.2 extends decision trees with two tree ensemble methods: random forests and gradient-boosted trees, among the most successful tree-based models for classification and regression. Finally, MLlib’s Python implementation receives a major update in 1.2 to simplify the process of adding Python APIs, along with better Python API coverage.

    @@ -438,7 +438,7 @@ diff --git a/site/releases/spark-release-1-2-1.html b/site/releases/spark-release-1-2-1.html index 3fbd5885b..e9e3bd689 100644 --- a/site/releases/spark-release-1-2-1.html +++ b/site/releases/spark-release-1-2-1.html @@ -310,7 +310,7 @@ diff --git a/site/releases/spark-release-1-2-2.html b/site/releases/spark-release-1-2-2.html index b23ee12cf..ad2e10999 100644 --- a/site/releases/spark-release-1-2-2.html +++ b/site/releases/spark-release-1-2-2.html @@ -270,7 +270,7 @@ diff --git a/site/releases/spark-release-1-3-0.html b/site/releases/spark-release-1-3-0.html index aaf84868f..f37a59d3c 100644 --- a/site/releases/spark-release-1-3-0.html +++ b/site/releases/spark-release-1-3-0.html @@ -191,7 +191,7 @@

    To download Spark 1.3 visit the downloads page.

    Spark Core

    -

    Spark 1.3 sees a handful of usability improvements in the core engine. The core API now supports multi level aggregation trees to help speed up expensive reduce operations. Improved error reporting has been added for certain gotcha operations. Spark’s Jetty dependency is now shaded to help avoid conflicts with user programs. Spark now supports SSL encryption for some communication endpoints. Finaly, realtime GC metrics and record counts have been added to the UI.

    +

    Spark 1.3 sees a handful of usability improvements in the core engine. The core API now supports multi level aggregation trees to help speed up expensive reduce operations. Improved error reporting has been added for certain gotcha operations. Spark’s Jetty dependency is now shaded to help avoid conflicts with user programs. Spark now supports SSL encryption for some communication endpoints. Finaly, realtime GC metrics and record counts have been added to the UI.

    DataFrame API

    Spark 1.3 adds a new DataFrames API that provides powerful and convenient operators when working with structured datasets. The DataFrame is an evolution of the base RDD API that includes named fields along with schema information. It’s easy to construct a DataFrame from sources such as Hive tables, JSON data, a JDBC database, or any implementation of Spark’s new data source API. Data frames will become a common interchange format between Spark components and when importing and exporting data to other systems. Data frames are supported in Python, Scala, and Java.

    @@ -203,7 +203,7 @@

    In this release Spark MLlib introduces several new algorithms: latent Dirichlet allocation (LDA) for topic modeling, multinomial logistic regression for multiclass classification, Gaussian mixture model (GMM) and power iteration clustering for clustering, FP-growth for frequent pattern mining, and block matrix abstraction for distributed linear algebra. Initial support has been added for model import/export in exchangeable format, which will be expanded in future versions to cover more model types in Java/Python/Scala. The implementations of k-means and ALS receive updates that lead to significant performance gain. PySpark now supports the ML pipeline API added in Spark 1.2, and gradient boosted trees and Gaussian mixture model. Finally, the ML pipeline API has been ported to support the new DataFrames abstraction.

    Spark Streaming

    -

    Spark 1.3 introduces a new direct Kafka API (docs) which enables exactly-once delivery without the use of write ahead logs. It also adds a Python Kafka API along with infrastructure for additional Python API’s in future releases. An online version of logistic regression and the ability to read binary records have also been added. For stateful operations, support has been added for loading of an initial state RDD. Finally, the streaming programming guide has been updated to include information about SQL and DataFrame operations within streaming applications, and important clarifications to the fault-tolerance semantics.

    +

    Spark 1.3 introduces a new direct Kafka API (docs) which enables exactly-once delivery without the use of write ahead logs. It also adds a Python Kafka API along with infrastructure for additional Python API’s in future releases. An online version of logistic regression and the ability to read binary records have also been added. For stateful operations, support has been added for loading of an initial state RDD. Finally, the streaming programming guide has been updated to include information about SQL and DataFrame operations within streaming applications, and important clarifications to the fault-tolerance semantics.

    GraphX

    GraphX adds a handful of utility functions in this release, including conversion into a canonical edge graph.

    @@ -219,7 +219,7 @@ @@ -415,7 +415,7 @@ diff --git a/site/releases/spark-release-1-3-1.html b/site/releases/spark-release-1-3-1.html index e13b6acf5..5c444b632 100644 --- a/site/releases/spark-release-1-3-1.html +++ b/site/releases/spark-release-1-3-1.html @@ -196,10 +196,10 @@

    Spark SQL

    Spark Streaming

    @@ -302,7 +302,7 @@ diff --git a/site/releases/spark-release-1-4-0.html b/site/releases/spark-release-1-4-0.html index c5ba82024..e6e1f0286 100644 --- a/site/releases/spark-release-1-4-0.html +++ b/site/releases/spark-release-1-4-0.html @@ -250,7 +250,7 @@ Python coverage. MLlib also adds several new algorithms.

    Spark Streaming

    -

    Spark streaming adds visual instrumentation graphs and significantly improved debugging information in the UI. It also enhances support for both Kafka and Kinesis.

    +

    Spark streaming adds visual instrumentation graphs and significantly improved debugging information in the UI. It also enhances support for both Kafka and Kinesis.