summaryrefslogtreecommitdiff
path: root/site/releases/spark-release-1-3-0.html
diff options
context:
space:
mode:
Diffstat (limited to 'site/releases/spark-release-1-3-0.html')
-rw-r--r--site/releases/spark-release-1-3-0.html6
1 files changed, 3 insertions, 3 deletions
diff --git a/site/releases/spark-release-1-3-0.html b/site/releases/spark-release-1-3-0.html
index 07545167f..382ef4d06 100644
--- a/site/releases/spark-release-1-3-0.html
+++ b/site/releases/spark-release-1-3-0.html
@@ -191,7 +191,7 @@
<p>To download Spark 1.3 visit the <a href="/downloads.html">downloads</a> page.</p>
<h3 id="spark-core">Spark Core</h3>
-<p>Spark 1.3 sees a handful of usability improvements in the core engine. The core API now supports <a href="https://issues.apache.org/jira/browse/SPARK-5430">multi level aggregation trees</a> to help speed up expensive reduce operations. <a href="https://issues.apache.org/jira/browse/SPARK-5063">Improved error reporting</a> has been added for certain gotcha operations. Spark&#8217;s Jetty dependency is <a href="https://issues.apache.org/jira/browse/SPARK-3996">now shaded</a> to help avoid conflicts with user programs. Spark now supports <a href="https://issues.apache.org/jira/browse/SPARK-3883">SSL encryption</a> for some communication endpoints. Finaly, realtime <a href="https://issues.apache.org/jira/browse/SPARK-3428">GC metrics</a> and <a href="https://issues.apache.org/jira/browse/SPARK-4874">record counts</a> have been added to the UI. </p>
+<p>Spark 1.3 sees a handful of usability improvements in the core engine. The core API now supports <a href="https://issues.apache.org/jira/browse/SPARK-5430">multi level aggregation trees</a> to help speed up expensive reduce operations. <a href="https://issues.apache.org/jira/browse/SPARK-5063">Improved error reporting</a> has been added for certain gotcha operations. Spark&#8217;s Jetty dependency is <a href="https://issues.apache.org/jira/browse/SPARK-3996">now shaded</a> to help avoid conflicts with user programs. Spark now supports <a href="https://issues.apache.org/jira/browse/SPARK-3883">SSL encryption</a> for some communication endpoints. Finaly, realtime <a href="https://issues.apache.org/jira/browse/SPARK-3428">GC metrics</a> and <a href="https://issues.apache.org/jira/browse/SPARK-4874">record counts</a> have been added to the UI.</p>
<h3 id="dataframe-api">DataFrame API</h3>
<p>Spark 1.3 adds a new <a href="/docs/1.3.0/sql-programming-guide.html#dataframes">DataFrames API</a> that provides powerful and convenient operators when working with structured datasets. The DataFrame is an evolution of the base RDD API that includes named fields along with schema information. It’s easy to construct a DataFrame from sources such as Hive tables, JSON data, a JDBC database, or any implementation of Spark’s new data source API. Data frames will become a common interchange format between Spark components and when importing and exporting data to other systems. Data frames are supported in Python, Scala, and Java.</p>
@@ -203,7 +203,7 @@
<p>In this release Spark MLlib introduces several new algorithms: latent Dirichlet allocation (LDA) for <a href="https://issues.apache.org/jira/browse/SPARK-1405">topic modeling</a>, <a href="https://issues.apache.org/jira/browse/SPARK-2309">multinomial logistic regression</a> for multiclass classification, <a href="https://issues.apache.org/jira/browse/SPARK-5012">Gaussian mixture model (GMM)</a> and <a href="https://issues.apache.org/jira/browse/SPARK-4259">power iteration clustering</a> for clustering, <a href="https://issues.apache.org/jira/browse/SPARK-4001">FP-growth</a> for frequent pattern mining, and <a href="https://issues.apache.org/jira/browse/SPARK-4409">block matrix abstraction</a> for distributed linear algebra. Initial support has been added for <a href="https://issues.apache.org/jira/browse/SPARK-4587">model import/export</a> in exchangeable format, which will be expanded in future versions to cover more model types in Java/Python/Scala. The implementations of k-means and ALS receive <a href="https://issues.apache.org/jira/browse/SPARK-3424, https://issues.apache.org/jira/browse/SPARK-3541">updates</a> that lead to significant performance gain. PySpark now supports the <a href="https://issues.apache.org/jira/browse/SPARK-4586">ML pipeline API</a> added in Spark 1.2, and <a href="https://issues.apache.org/jira/browse/SPARK-5094">gradient boosted trees</a> and <a href="https://issues.apache.org/jira/browse/SPARK-5012">Gaussian mixture model</a>. Finally, the ML pipeline API has been ported to support the new DataFrames abstraction.</p>
<h3 id="spark-streaming">Spark Streaming</h3>
-<p>Spark 1.3 introduces a new <a href="https://issues.apache.org/jira/browse/SPARK-4964"><em>direct</em> Kafka API</a> (<a href="http://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html">docs</a>) which enables exactly-once delivery without the use of write ahead logs. It also adds a <a href="https://issues.apache.org/jira/browse/SPARK-5047">Python Kafka API</a> along with infrastructure for additional Python API’s in future releases. An online version of <a href="https://issues.apache.org/jira/browse/SPARK-4979">logistic regression</a> and the ability to read <a href="https://issues.apache.org/jira/browse/SPARK-4969">binary records</a> have also been added. For stateful operations, support has been added for loading of an <a href="https://issues.apache.org/jira/browse/SPARK-3660">initial state RDD</a>. Finally, the streaming programming guide has been updated to include information about SQL and DataFrame operations within streaming applications, and important clarifications to the fault-tolerance semantics. </p>
+<p>Spark 1.3 introduces a new <a href="https://issues.apache.org/jira/browse/SPARK-4964"><em>direct</em> Kafka API</a> (<a href="http://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html">docs</a>) which enables exactly-once delivery without the use of write ahead logs. It also adds a <a href="https://issues.apache.org/jira/browse/SPARK-5047">Python Kafka API</a> along with infrastructure for additional Python API’s in future releases. An online version of <a href="https://issues.apache.org/jira/browse/SPARK-4979">logistic regression</a> and the ability to read <a href="https://issues.apache.org/jira/browse/SPARK-4969">binary records</a> have also been added. For stateful operations, support has been added for loading of an <a href="https://issues.apache.org/jira/browse/SPARK-3660">initial state RDD</a>. Finally, the streaming programming guide has been updated to include information about SQL and DataFrame operations within streaming applications, and important clarifications to the fault-tolerance semantics.</p>
<h3 id="graphx">GraphX</h3>
<p>GraphX adds a handful of utility functions in this release, including conversion into a <a href="https://issues.apache.org/jira/browse/SPARK-4917">canonical edge graph</a>.</p>
@@ -219,7 +219,7 @@
<ul>
<li><a href="https://issues.apache.org/jira/browse/SPARK-6194">SPARK-6194</a>: A memory leak in PySPark&#8217;s <code>collect()</code>.</li>
<li><a href="https://issues.apache.org/jira/browse/SPARK-6222">SPARK-6222</a>: An issue with failure recovery in Spark Streaming.</li>
- <li><a href="https://issues.apache.org/jira/browse/SPARK-6315">SPARK-6315</a>: Spark SQL can&#8217;t read parquet data generated with Spark 1.1. </li>
+ <li><a href="https://issues.apache.org/jira/browse/SPARK-6315">SPARK-6315</a>: Spark SQL can&#8217;t read parquet data generated with Spark 1.1.</li>
<li><a href="https://issues.apache.org/jira/browse/SPARK-6247">SPARK-6247</a>: Errors analyzing certain join types in Spark SQL.</li>
</ul>