summaryrefslogtreecommitdiff
path: root/releases/_posts/2014-05-30-spark-release-1-0-0.md
diff options
context:
space:
mode:
authorMatei Alexandru Zaharia <matei@apache.org>2014-06-04 20:18:25 +0000
committerMatei Alexandru Zaharia <matei@apache.org>2014-06-04 20:18:25 +0000
commit63c7d387a2a69e5ccd23c3de0af0e4cc428fbfd4 (patch)
tree2cb07b5739340d1d31ce1a6d4f0122b2d7bb79fb /releases/_posts/2014-05-30-spark-release-1-0-0.md
parent638088923dbfe94215c4e0edfac8beb2e7b483f8 (diff)
downloadspark-website-63c7d387a2a69e5ccd23c3de0af0e4cc428fbfd4.tar.gz
spark-website-63c7d387a2a69e5ccd23c3de0af0e4cc428fbfd4.tar.bz2
spark-website-63c7d387a2a69e5ccd23c3de0af0e4cc428fbfd4.zip
website tweaks: release note links and scaling FAQ
Diffstat (limited to 'releases/_posts/2014-05-30-spark-release-1-0-0.md')
-rw-r--r--releases/_posts/2014-05-30-spark-release-1-0-0.md16
1 files changed, 8 insertions, 8 deletions
diff --git a/releases/_posts/2014-05-30-spark-release-1-0-0.md b/releases/_posts/2014-05-30-spark-release-1-0-0.md
index 21bb309fe..acb6b3e61 100644
--- a/releases/_posts/2014-05-30-spark-release-1-0-0.md
+++ b/releases/_posts/2014-05-30-spark-release-1-0-0.md
@@ -11,7 +11,7 @@ meta:
_wpas_done_all: '1'
---
-Spark 1.0.0 is a major release marking the start of the 1.X line. This release brings both a variety of new features and strong API compatibility guarantees throughout the 1.X line. Spark 1.0 adds a new major component, [Spark SQL]({{site.url}}docs/1.0.0/sql-programming-guide.html), for loading and manipulating structured data in Spark. It includes major extensions to all of Spark’s existing standard libraries ([ML]({{site.url}}docs/1.0.0/mllib-guide.html), [Streaming]({{site.url}}docs/1.0.0/streaming-programming-guide.html), and [GraphX]({{site.url}}docs/1.0.0/graphx-programming-guide.html)) while also enhancing language support in Java and Python. Finally, Spark 1.0 brings operational improvements including full support for the Hadoop/YARN security model and a unified submission process for all supported cluster managers.
+Spark 1.0.0 is a major release marking the start of the 1.X line. This release brings both a variety of new features and strong API compatibility guarantees throughout the 1.X line. Spark 1.0 adds a new major component, [Spark SQL]({{site.url}}docs/latest/sql-programming-guide.html), for loading and manipulating structured data in Spark. It includes major extensions to all of Spark’s existing standard libraries ([ML]({{site.url}}docs/latest/mllib-guide.html), [Streaming]({{site.url}}docs/latest/streaming-programming-guide.html), and [GraphX]({{site.url}}docs/latest/graphx-programming-guide.html)) while also enhancing language support in Java and Python. Finally, Spark 1.0 brings operational improvements including full support for the Hadoop/YARN security model and a unified submission process for all supported cluster managers.
You can download Spark 1.0.0 as either a
<a href="http://d3kbcqa49mib13.cloudfront.net/spark-1.0.0.tgz" onClick="trackOutboundLink(this, 'Release Download Links', 'cloudfront_spark-1.0.0.tgz'); return false;">source package</a>
@@ -28,22 +28,22 @@ Spark 1.0.0 is the first release in the 1.X major line. Spark is guaranteeing st
For users running in secured Hadoop environments, Spark now integrates with the Hadoop/YARN security model. Spark will authenticate job submission, securely transfer HDFS credentials, and authenticate communication between components.
### Operational and Packaging Improvements
-This release significantly simplifies the process of bundling and submitting a Spark application. A new [spark-submit tool]({{site.url}}docs/1.0.0/submitting-applications.html) allows users to submit an application to any Spark cluster, including local clusters, Mesos, or YARN, through a common process. The documentation for bundling Spark applications has been substantially expanded. We’ve also added a history server for Spark’s web UI, allowing users to view Spark application data after individual applications are finished.
+This release significantly simplifies the process of bundling and submitting a Spark application. A new [spark-submit tool]({{site.url}}docs/latest/submitting-applications.html) allows users to submit an application to any Spark cluster, including local clusters, Mesos, or YARN, through a common process. The documentation for bundling Spark applications has been substantially expanded. We’ve also added a history server for Spark’s web UI, allowing users to view Spark application data after individual applications are finished.
### Spark SQL
-This release introduces [Spark SQL]({{site.url}}docs/1.0.0/sql-programming-guide.html) as a new alpha component. Spark SQL provides support for loading and manipulating structured data in Spark, either from external structured data sources (currently Hive and Parquet) or by adding a schema to an existing RDD. Spark SQL’s API interoperates with the RDD data model, allowing users to interleave Spark code with SQL statements. Under the hood, Spark SQL uses the Catalyst optimizer to choose an efficient execution plan, and can automatically push predicates into storage formats like Parquet. In future releases, Spark SQL will also provide a common API to other storage systems.
+This release introduces [Spark SQL]({{site.url}}docs/latest/sql-programming-guide.html) as a new alpha component. Spark SQL provides support for loading and manipulating structured data in Spark, either from external structured data sources (currently Hive and Parquet) or by adding a schema to an existing RDD. Spark SQL’s API interoperates with the RDD data model, allowing users to interleave Spark code with SQL statements. Under the hood, Spark SQL uses the Catalyst optimizer to choose an efficient execution plan, and can automatically push predicates into storage formats like Parquet. In future releases, Spark SQL will also provide a common API to other storage systems.
### MLlib Improvements
-In 1.0.0, Spark’s MLlib adds support for sparse feature vectors in Scala, Java, and Python. It takes advantage of sparsity in both storage and computation in linear methods, k-means, and naive Bayes. In addition, this release adds several new algorithms: scalable decision trees for both classification and regression, distributed matrix algorithms including SVD and PCA, model evaluation functions, and L-BFGS as an optimization primitive. The programming guide and code examples for MLlib have also been greatly expanded.
+In 1.0.0, Spark’s MLlib adds support for sparse feature vectors in Scala, Java, and Python. It takes advantage of sparsity in both storage and computation in linear methods, k-means, and naive Bayes. In addition, this release adds several new algorithms: scalable decision trees for both classification and regression, distributed matrix algorithms including SVD and PCA, model evaluation functions, and L-BFGS as an optimization primitive. The [MLlib programming guide]({{site.url}}docs/latest/mllib-guide.html) and code examples have also been greatly expanded.
### GraphX and Streaming Improvements
In addition to usability and maintainability improvements, GraphX in Spark 1.0 brings substantial performance boosts in graph loading, edge reversal, and neighborhood computation. These operations now require less communication and produce simpler RDD graphs. Spark’s Streaming module has added performance optimizations for stateful stream transformations, along with improved Flume support, and automated state cleanup for long running jobs.
### Extended Java and Python Support
-Spark 1.0 adds support for Java 8 [new lambda syntax](http://www.oracle.com/webfolder/technetwork/tutorials/obe/java/Lambda-QuickStart/index.html#section2) in its Java bindings. Java 8 supports a concise syntax for writing anonymous functions, similar to the closure syntax in Scala and Python. This change requires small changes for users of the current Java API, which are noted in the documentation. Spark’s Python API has been extended to support several new functions. We’ve also included several stability improvements in the Python API, particularly for large datasets. PySpark now supports running on YARN as well.
+Spark 1.0 adds support for Java 8 [new lambda syntax](http://docs.oracle.com/javase/tutorial/java/javaOO/lambdaexpressions.html) in its Java bindings. Java 8 supports a concise syntax for writing anonymous functions, similar to the closure syntax in Scala and Python. This change requires small changes for users of the current Java API, which are noted in the documentation. Spark’s Python API has been extended to support several new functions. We’ve also included several stability improvements in the Python API, particularly for large datasets. PySpark now supports running on YARN as well.
### Documentation
-Spark’s programming guide has been significantly expanded to centrally cover all supported languages and discuss more operators and aspects of the development life cycle. The MLlib guide has also been expanded with significantly more detail and examples for each algorithm, while documents on configuration, YARN and Mesos have also been revamped.
+Spark's [programming guide]({{site.url}}docs/latest/programming-guide.html) has been significantly expanded to centrally cover all supported languages and discuss more operators and aspects of the development life cycle. The [MLlib guide]({{site.url}}docs/latest/mllib-guide.html) has also been expanded with significantly more detail and examples for each algorithm, while documents on configuration, YARN and Mesos have also been revamped.
### Smaller Changes
- PySpark now works with more Python versions than before -- Python 2.6+ instead of 2.7+, and NumPy 1.4+ instead of 1.7+.
@@ -52,12 +52,12 @@ Spark’s programming guide has been significantly expanded to centrally cover a
- Support for off-heap storage in Tachyon has been added via a special build target.
- Datasets persisted with `DISK_ONLY` now write directly to disk, significantly improving memory usage for large datasets.
- Intermediate state created during a Spark job is now garbage collected when the corresponding RDDs become unreferenced, improving performance.
-- Spark now includes a [Javadoc version]({{site.url}}docs/1.0.0/api/java/index.html) of all its API docs and a [unified Scaladoc]({{site.url}}docs/1.0.0/api/scala/index.html) for all modules.
+- Spark now includes a [Javadoc version]({{site.url}}docs/latest/api/java/index.html) of all its API docs and a [unified Scaladoc]({{site.url}}docs/latest/api/scala/index.html) for all modules.
- A new SparkContext.wholeTextFiles method lets you operate on small text files as individual records.
### Migrating to Spark 1.0
-While most of the Spark API remains the same as in 0.x versions, a few changes have been made for long-term flexibility, especially in the Java API (to support Java 8 lambdas). The documentation includes [migration information]({{site.url}}docs/1.0.0/programming-guide.html#migrating-from-pre-10-versions-of-spark) to upgrade your applications.
+While most of the Spark API remains the same as in 0.x versions, a few changes have been made for long-term flexibility, especially in the Java API (to support Java 8 lambdas). The documentation includes [migration information]({{site.url}}docs/latest/programming-guide.html#migrating-from-pre-10-versions-of-spark) to upgrade your applications.
### Contributors
The following developers contributed to this release: