summaryrefslogtreecommitdiff
path: root/site/releases/spark-release-1-2-0.html
diff options
context:
space:
mode:
Diffstat (limited to 'site/releases/spark-release-1-2-0.html')
-rw-r--r--site/releases/spark-release-1-2-0.html2
1 files changed, 1 insertions, 1 deletions
diff --git a/site/releases/spark-release-1-2-0.html b/site/releases/spark-release-1-2-0.html
index 219990e69..0afe0ca3e 100644
--- a/site/releases/spark-release-1-2-0.html
+++ b/site/releases/spark-release-1-2-0.html
@@ -194,7 +194,7 @@
<p>In 1.2 Spark core upgrades two major subsystems to improve the performance and stability of very large scale shuffles. The first is Spark’s communication manager used during bulk transfers, which upgrades to a <a href="https://issues.apache.org/jira/browse/SPARK-2468">netty-based implementation</a>. The second is Spark’s shuffle mechanism, which upgrades to the <a href="https://issues.apache.org/jira/browse/SPARK-3280">“sort based” shuffle initially released in Spark 1.1</a>. These both improve the performance and stability of very large scale shuffles. Spark also adds an <a href="https://issues.apache.org/jira/browse/SPARK-3174">elastic scaling mechanism</a> designed to improve cluster utilization during long running ETL-style jobs. This is currently supported on YARN and will make its way to other cluster managers in future versions. Finally, Spark 1.2 adds support for Scala 2.11. For instructions on building for Scala 2.11 see the <a href="/docs/1.2.0/building-spark.html#building-for-scala-211">build documentation</a>.</p>
<h3 id="spark-streaming">Spark Streaming</h3>
-<p>This release includes two major feature additions to Spark’s streaming library, a Python API and a write ahead log for full driver H/A. The <a href="https://issues.apache.org/jira/browse/SPARK-2377">Python API</a> covers almost all the DStream transformations and output operations. Input sources based on text files and text over sockets are currently supported. Support for Kafka and Flume input streams in Python will be added in the next release. Second, Spark streaming now features H/A driver support through a <a href="https://issues.apache.org/jira/browse/SPARK-3129">write ahead log (WAL)</a>. In Spark 1.1 and earlier, some buffered (received but not yet processed) data can be lost during driver restarts. To prevent this Spark 1.2 adds an optional WAL, which buffers received data into a fault-tolerant file system (e.g. HDFS). See the <a href="/docs/1.2.0/streaming-programming-guide.html">streaming programming guide</a> for more details. </p>
+<p>This release includes two major feature additions to Spark’s streaming library, a Python API and a write ahead log for full driver H/A. The <a href="https://issues.apache.org/jira/browse/SPARK-2377">Python API</a> covers almost all the DStream transformations and output operations. Input sources based on text files and text over sockets are currently supported. Support for Kafka and Flume input streams in Python will be added in the next release. Second, Spark streaming now features H/A driver support through a <a href="https://issues.apache.org/jira/browse/SPARK-3129">write ahead log (WAL)</a>. In Spark 1.1 and earlier, some buffered (received but not yet processed) data can be lost during driver restarts. To prevent this Spark 1.2 adds an optional WAL, which buffers received data into a fault-tolerant file system (e.g. HDFS). See the <a href="/docs/1.2.0/streaming-programming-guide.html">streaming programming guide</a> for more details.</p>
<h3 id="mllib">MLLib</h3>
<p>Spark 1.2 previews a new set of machine learning API’s in a package called spark.ml that <a href="https://issues.apache.org/jira/browse/SPARK-3530">supports learning pipelines</a>, where multiple algorithms are run in sequence with varying parameters. This type of pipeline is common in practical machine learning deployments. The new ML package uses Spark’s SchemaRDD to represent <a href="https://issues.apache.org/jira/browse/SPARK-3573">ML datasets</a>, providing direct interoperability with Spark SQL. In addition to the new API, Spark 1.2 extends decision trees with two tree ensemble methods: <a href="https://issues.apache.org/jira/browse/SPARK-1545">random forests</a> and <a href="https://issues.apache.org/jira/browse/SPARK-1547">gradient-boosted trees</a>, among the most successful tree-based models for classification and regression. Finally, MLlib&#8217;s Python implementation receives a major update in 1.2 to simplify the process of adding Python APIs, along with better Python API coverage.</p>