summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorSean Owen <sowen@cloudera.com>2016-08-31 12:38:33 +0100
committerSean Owen <sowen@cloudera.com>2016-08-31 12:38:33 +0100
commit0845f49def1dd1fc5fb439d2f1e22f03297944ed (patch)
tree587fddee53464850254e2b65a4b2aa0d7b2a2443
parentfcd0bc3dd9812c263a17bd4df4e85ce3a89d2b8b (diff)
downloadspark-website-0845f49def1dd1fc5fb439d2f1e22f03297944ed.tar.gz
spark-website-0845f49def1dd1fc5fb439d2f1e22f03297944ed.tar.bz2
spark-website-0845f49def1dd1fc5fb439d2f1e22f03297944ed.zip
Re-sync Spark site HTML to output of latest jekyll
-rw-r--r--site/documentation.html5
-rw-r--r--site/examples.html60
-rw-r--r--site/news/index.html44
-rw-r--r--site/news/spark-0-9-1-released.html2
-rw-r--r--site/news/spark-0-9-2-released.html2
-rw-r--r--site/news/spark-1-1-0-released.html2
-rw-r--r--site/news/spark-1-2-2-released.html2
-rw-r--r--site/news/spark-and-shark-in-the-news.html2
-rw-r--r--site/news/spark-summit-east-2015-videos-posted.html2
-rw-r--r--site/releases/spark-release-0-8-0.html4
-rw-r--r--site/releases/spark-release-0-9-1.html20
-rw-r--r--site/releases/spark-release-1-0-1.html8
-rw-r--r--site/releases/spark-release-1-0-2.html2
-rw-r--r--site/releases/spark-release-1-1-0.html6
-rw-r--r--site/releases/spark-release-1-2-0.html2
-rw-r--r--site/releases/spark-release-1-3-0.html6
-rw-r--r--site/releases/spark-release-1-3-1.html6
-rw-r--r--site/releases/spark-release-1-4-0.html4
-rw-r--r--site/releases/spark-release-1-5-0.html30
-rw-r--r--site/releases/spark-release-1-6-0.html20
-rw-r--r--site/releases/spark-release-2-0-0.html36
21 files changed, 118 insertions, 147 deletions
diff --git a/site/documentation.html b/site/documentation.html
index 652281d11..859b76708 100644
--- a/site/documentation.html
+++ b/site/documentation.html
@@ -253,12 +253,13 @@
</ul>
<h4><a name="meetup-videos"></a>Meetup Talk Videos</h4>
-<p>In addition to the videos listed below, you can also view <a href="http://www.meetup.com/spark-users/files/">all slides from Bay Area meetups here</a>.
+<p>In addition to the videos listed below, you can also view <a href="http://www.meetup.com/spark-users/files/">all slides from Bay Area meetups here</a>.</p>
<style type="text/css">
.video-meta-info {
font-size: 0.95em;
}
-</style></p>
+</style>
+
<ul>
<li><a href="http://www.youtube.com/watch?v=NUQ-8to2XAk&amp;list=PL-x35fyliRwiP3YteXbnhk0QGOtYLBT3a">Spark 1.0 and Beyond</a> (<a href="http://files.meetup.com/3138542/Spark%201.0%20Meetup.ppt">slides</a>) <span class="video-meta-info">by Patrick Wendell, at Cisco in San Jose, 2014-04-23</span></li>
diff --git a/site/examples.html b/site/examples.html
index 5431f5dda..1be96be50 100644
--- a/site/examples.html
+++ b/site/examples.html
@@ -213,11 +213,11 @@ In this page, we will show examples using RDD API as well as examples using high
<div class="tab-pane tab-pane-python active">
<div class="code code-tab">
-<div class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">text_file</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s">&quot;hdfs://...&quot;</span><span class="p">)</span>
+<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">text_file</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s">&quot;hdfs://...&quot;</span><span class="p">)</span>
<span class="n">counts</span> <span class="o">=</span> <span class="n">text_file</span><span class="o">.</span><span class="n">flatMap</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">&quot; &quot;</span><span class="p">))</span> \
<span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">word</span><span class="p">:</span> <span class="p">(</span><span class="n">word</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span> \
<span class="o">.</span><span class="n">reduceByKey</span><span class="p">(</span><span class="k">lambda</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">:</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span><span class="p">)</span>
-<span class="n">counts</span><span class="o">.</span><span class="n">saveAsTextFile</span><span class="p">(</span><span class="s">&quot;hdfs://...&quot;</span><span class="p">)</span></code></pre></div>
+<span class="n">counts</span><span class="o">.</span><span class="n">saveAsTextFile</span><span class="p">(</span><span class="s">&quot;hdfs://...&quot;</span><span class="p">)</span></code></pre></figure>
</div>
</div>
@@ -225,11 +225,11 @@ In this page, we will show examples using RDD API as well as examples using high
<div class="tab-pane tab-pane-scala">
<div class="code code-tab">
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">textFile</span> <span class="k">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="o">(</span><span class="s">&quot;hdfs://...&quot;</span><span class="o">)</span>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">textFile</span> <span class="k">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="o">(</span><span class="s">&quot;hdfs://...&quot;</span><span class="o">)</span>
<span class="k">val</span> <span class="n">counts</span> <span class="k">=</span> <span class="n">textFile</span><span class="o">.</span><span class="n">flatMap</span><span class="o">(</span><span class="n">line</span> <span class="k">=&gt;</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="o">(</span><span class="s">&quot; &quot;</span><span class="o">))</span>
<span class="o">.</span><span class="n">map</span><span class="o">(</span><span class="n">word</span> <span class="k">=&gt;</span> <span class="o">(</span><span class="n">word</span><span class="o">,</span> <span class="mi">1</span><span class="o">))</span>
<span class="o">.</span><span class="n">reduceByKey</span><span class="o">(</span><span class="k">_</span> <span class="o">+</span> <span class="k">_</span><span class="o">)</span>
-<span class="n">counts</span><span class="o">.</span><span class="n">saveAsTextFile</span><span class="o">(</span><span class="s">&quot;hdfs://...&quot;</span><span class="o">)</span></code></pre></div>
+<span class="n">counts</span><span class="o">.</span><span class="n">saveAsTextFile</span><span class="o">(</span><span class="s">&quot;hdfs://...&quot;</span><span class="o">)</span></code></pre></figure>
</div>
</div>
@@ -237,7 +237,7 @@ In this page, we will show examples using RDD API as well as examples using high
<div class="tab-pane tab-pane-java">
<div class="code code-tab">
-<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">textFile</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="na">textFile</span><span class="o">(</span><span class="s">&quot;hdfs://...&quot;</span><span class="o">);</span>
+<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">textFile</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="na">textFile</span><span class="o">(</span><span class="s">&quot;hdfs://...&quot;</span><span class="o">);</span>
<span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">words</span> <span class="o">=</span> <span class="n">textFile</span><span class="o">.</span><span class="na">flatMap</span><span class="o">(</span><span class="k">new</span> <span class="n">FlatMapFunction</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;()</span> <span class="o">{</span>
<span class="kd">public</span> <span class="n">Iterable</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="nf">call</span><span class="o">(</span><span class="n">String</span> <span class="n">s</span><span class="o">)</span> <span class="o">{</span> <span class="k">return</span> <span class="n">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span><span class="n">s</span><span class="o">.</span><span class="na">split</span><span class="o">(</span><span class="s">&quot; &quot;</span><span class="o">));</span> <span class="o">}</span>
<span class="o">});</span>
@@ -247,7 +247,7 @@ In this page, we will show examples using RDD API as well as examples using high
<span class="n">JavaPairRDD</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">&gt;</span> <span class="n">counts</span> <span class="o">=</span> <span class="n">pairs</span><span class="o">.</span><span class="na">reduceByKey</span><span class="o">(</span><span class="k">new</span> <span class="n">Function2</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">,</span> <span class="n">Integer</span><span class="o">,</span> <span class="n">Integer</span><span class="o">&gt;()</span> <span class="o">{</span>
<span class="kd">public</span> <span class="n">Integer</span> <span class="nf">call</span><span class="o">(</span><span class="n">Integer</span> <span class="n">a</span><span class="o">,</span> <span class="n">Integer</span> <span class="n">b</span><span class="o">)</span> <span class="o">{</span> <span class="k">return</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span><span class="o">;</span> <span class="o">}</span>
<span class="o">});</span>
-<span class="n">counts</span><span class="o">.</span><span class="na">saveAsTextFile</span><span class="o">(</span><span class="s">&quot;hdfs://...&quot;</span><span class="o">);</span></code></pre></div>
+<span class="n">counts</span><span class="o">.</span><span class="na">saveAsTextFile</span><span class="o">(</span><span class="s">&quot;hdfs://...&quot;</span><span class="o">);</span></code></pre></figure>
</div>
</div>
@@ -266,13 +266,13 @@ In this page, we will show examples using RDD API as well as examples using high
<div class="tab-pane tab-pane-python active">
<div class="code code-tab">
-<div class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">sample</span><span class="p">(</span><span class="n">p</span><span class="p">):</span>
+<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">sample</span><span class="p">(</span><span class="n">p</span><span class="p">):</span>
<span class="n">x</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">random</span><span class="p">(),</span> <span class="n">random</span><span class="p">()</span>
<span class="k">return</span> <span class="mi">1</span> <span class="k">if</span> <span class="n">x</span><span class="o">*</span><span class="n">x</span> <span class="o">+</span> <span class="n">y</span><span class="o">*</span><span class="n">y</span> <span class="o">&lt;</span> <span class="mi">1</span> <span class="k">else</span> <span class="mi">0</span>
<span class="n">count</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="p">(</span><span class="nb">xrange</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">NUM_SAMPLES</span><span class="p">))</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="n">sample</span><span class="p">)</span> \
<span class="o">.</span><span class="n">reduce</span><span class="p">(</span><span class="k">lambda</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">:</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span><span class="p">)</span>
-<span class="k">print</span> <span class="s">&quot;Pi is roughly </span><span class="si">%f</span><span class="s">&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="mf">4.0</span> <span class="o">*</span> <span class="n">count</span> <span class="o">/</span> <span class="n">NUM_SAMPLES</span><span class="p">)</span></code></pre></div>
+<span class="k">print</span> <span class="s">&quot;Pi is roughly </span><span class="si">%f</span><span class="s">&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="mf">4.0</span> <span class="o">*</span> <span class="n">count</span> <span class="o">/</span> <span class="n">NUM_SAMPLES</span><span class="p">)</span></code></pre></figure>
</div>
</div>
@@ -280,12 +280,12 @@ In this page, we will show examples using RDD API as well as examples using high
<div class="tab-pane tab-pane-scala">
<div class="code code-tab">
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">count</span> <span class="k">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="o">(</span><span class="mi">1</span> <span class="n">to</span> <span class="nc">NUM_SAMPLES</span><span class="o">).</span><span class="n">map</span><span class="o">{</span><span class="n">i</span> <span class="k">=&gt;</span>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">count</span> <span class="k">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="o">(</span><span class="mi">1</span> <span class="n">to</span> <span class="nc">NUM_SAMPLES</span><span class="o">).</span><span class="n">map</span><span class="o">{</span><span class="n">i</span> <span class="k">=&gt;</span>
<span class="k">val</span> <span class="n">x</span> <span class="k">=</span> <span class="nc">Math</span><span class="o">.</span><span class="n">random</span><span class="o">()</span>
<span class="k">val</span> <span class="n">y</span> <span class="k">=</span> <span class="nc">Math</span><span class="o">.</span><span class="n">random</span><span class="o">()</span>
<span class="k">if</span> <span class="o">(</span><span class="n">x</span><span class="o">*</span><span class="n">x</span> <span class="o">+</span> <span class="n">y</span><span class="o">*</span><span class="n">y</span> <span class="o">&lt;</span> <span class="mi">1</span><span class="o">)</span> <span class="mi">1</span> <span class="k">else</span> <span class="mi">0</span>
<span class="o">}.</span><span class="n">reduce</span><span class="o">(</span><span class="k">_</span> <span class="o">+</span> <span class="k">_</span><span class="o">)</span>
-<span class="n">println</span><span class="o">(</span><span class="s">&quot;Pi is roughly &quot;</span> <span class="o">+</span> <span class="mf">4.0</span> <span class="o">*</span> <span class="n">count</span> <span class="o">/</span> <span class="nc">NUM_SAMPLES</span><span class="o">)</span></code></pre></div>
+<span class="n">println</span><span class="o">(</span><span class="s">&quot;Pi is roughly &quot;</span> <span class="o">+</span> <span class="mf">4.0</span> <span class="o">*</span> <span class="n">count</span> <span class="o">/</span> <span class="nc">NUM_SAMPLES</span><span class="o">)</span></code></pre></figure>
</div>
</div>
@@ -293,7 +293,7 @@ In this page, we will show examples using RDD API as well as examples using high
<div class="tab-pane tab-pane-java">
<div class="code code-tab">
-<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">List</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">&gt;</span> <span class="n">l</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ArrayList</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">&gt;(</span><span class="n">NUM_SAMPLES</span><span class="o">);</span>
+<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">List</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">&gt;</span> <span class="n">l</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ArrayList</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">&gt;(</span><span class="n">NUM_SAMPLES</span><span class="o">);</span>
<span class="k">for</span> <span class="o">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="o">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">NUM_SAMPLES</span><span class="o">;</span> <span class="n">i</span><span class="o">++)</span> <span class="o">{</span>
<span class="n">l</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="n">i</span><span class="o">);</span>
<span class="o">}</span>
@@ -305,7 +305,7 @@ In this page, we will show examples using RDD API as well as examples using high
<span class="k">return</span> <span class="n">x</span><span class="o">*</span><span class="n">x</span> <span class="o">+</span> <span class="n">y</span><span class="o">*</span><span class="n">y</span> <span class="o">&lt;</span> <span class="mi">1</span><span class="o">;</span>
<span class="o">}</span>
<span class="o">}).</span><span class="na">count</span><span class="o">();</span>
-<span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">&quot;Pi is roughly &quot;</span> <span class="o">+</span> <span class="mf">4.0</span> <span class="o">*</span> <span class="n">count</span> <span class="o">/</span> <span class="n">NUM_SAMPLES</span><span class="o">);</span></code></pre></div>
+<span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">&quot;Pi is roughly &quot;</span> <span class="o">+</span> <span class="mf">4.0</span> <span class="o">*</span> <span class="n">count</span> <span class="o">/</span> <span class="n">NUM_SAMPLES</span><span class="o">);</span></code></pre></figure>
</div>
</div>
@@ -333,7 +333,7 @@ Also, programs based on DataFrame API will be automatically optimized by Sparkâ€
<div class="tab-pane tab-pane-python active">
<div class="code code-tab">
-<div class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">textFile</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s">&quot;hdfs://...&quot;</span><span class="p">)</span>
+<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">textFile</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s">&quot;hdfs://...&quot;</span><span class="p">)</span>
<span class="c"># Creates a DataFrame having a single column named &quot;line&quot;</span>
<span class="n">df</span> <span class="o">=</span> <span class="n">textFile</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">r</span><span class="p">:</span> <span class="n">Row</span><span class="p">(</span><span class="n">r</span><span class="p">))</span><span class="o">.</span><span class="n">toDF</span><span class="p">([</span><span class="s">&quot;line&quot;</span><span class="p">])</span>
@@ -343,7 +343,7 @@ Also, programs based on DataFrame API will be automatically optimized by Sparkâ€
<span class="c"># Counts errors mentioning MySQL</span>
<span class="n">errors</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">col</span><span class="p">(</span><span class="s">&quot;line&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">like</span><span class="p">(</span><span class="s">&quot;%MySQL%&quot;</span><span class="p">))</span><span class="o">.</span><span class="n">count</span><span class="p">()</span>
<span class="c"># Fetches the MySQL errors as an array of strings</span>
-<span class="n">errors</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">col</span><span class="p">(</span><span class="s">&quot;line&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">like</span><span class="p">(</span><span class="s">&quot;%MySQL%&quot;</span><span class="p">))</span><span class="o">.</span><span class="n">collect</span><span class="p">()</span></code></pre></div>
+<span class="n">errors</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">col</span><span class="p">(</span><span class="s">&quot;line&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">like</span><span class="p">(</span><span class="s">&quot;%MySQL%&quot;</span><span class="p">))</span><span class="o">.</span><span class="n">collect</span><span class="p">()</span></code></pre></figure>
</div>
</div>
@@ -351,7 +351,7 @@ Also, programs based on DataFrame API will be automatically optimized by Sparkâ€
<div class="tab-pane tab-pane-scala">
<div class="code code-tab">
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">textFile</span> <span class="k">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="o">(</span><span class="s">&quot;hdfs://...&quot;</span><span class="o">)</span>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">textFile</span> <span class="k">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="o">(</span><span class="s">&quot;hdfs://...&quot;</span><span class="o">)</span>
<span class="c1">// Creates a DataFrame having a single column named &quot;line&quot;</span>
<span class="k">val</span> <span class="n">df</span> <span class="k">=</span> <span class="n">textFile</span><span class="o">.</span><span class="n">toDF</span><span class="o">(</span><span class="s">&quot;line&quot;</span><span class="o">)</span>
@@ -361,7 +361,7 @@ Also, programs based on DataFrame API will be automatically optimized by Sparkâ€
<span class="c1">// Counts errors mentioning MySQL</span>
<span class="n">errors</span><span class="o">.</span><span class="n">filter</span><span class="o">(</span><span class="n">col</span><span class="o">(</span><span class="s">&quot;line&quot;</span><span class="o">).</span><span class="n">like</span><span class="o">(</span><span class="s">&quot;%MySQL%&quot;</span><span class="o">)).</span><span class="n">count</span><span class="o">()</span>
<span class="c1">// Fetches the MySQL errors as an array of strings</span>
-<span class="n">errors</span><span class="o">.</span><span class="n">filter</span><span class="o">(</span><span class="n">col</span><span class="o">(</span><span class="s">&quot;line&quot;</span><span class="o">).</span><span class="n">like</span><span class="o">(</span><span class="s">&quot;%MySQL%&quot;</span><span class="o">)).</span><span class="n">collect</span><span class="o">()</span></code></pre></div>
+<span class="n">errors</span><span class="o">.</span><span class="n">filter</span><span class="o">(</span><span class="n">col</span><span class="o">(</span><span class="s">&quot;line&quot;</span><span class="o">).</span><span class="n">like</span><span class="o">(</span><span class="s">&quot;%MySQL%&quot;</span><span class="o">)).</span><span class="n">collect</span><span class="o">()</span></code></pre></figure>
</div>
</div>
@@ -369,7 +369,7 @@ Also, programs based on DataFrame API will be automatically optimized by Sparkâ€
<div class="tab-pane tab-pane-java">
<div class="code code-tab">
-<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="c1">// Creates a DataFrame having a single column named &quot;line&quot;</span>
+<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="c1">// Creates a DataFrame having a single column named &quot;line&quot;</span>
<span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">textFile</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="na">textFile</span><span class="o">(</span><span class="s">&quot;hdfs://...&quot;</span><span class="o">);</span>
<span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">rowRDD</span> <span class="o">=</span> <span class="n">textFile</span><span class="o">.</span><span class="na">map</span><span class="o">(</span>
<span class="k">new</span> <span class="n">Function</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Row</span><span class="o">&gt;()</span> <span class="o">{</span>
@@ -388,7 +388,7 @@ Also, programs based on DataFrame API will be automatically optimized by Sparkâ€
<span class="c1">// Counts errors mentioning MySQL</span>
<span class="n">errors</span><span class="o">.</span><span class="na">filter</span><span class="o">(</span><span class="n">col</span><span class="o">(</span><span class="s">&quot;line&quot;</span><span class="o">).</span><span class="na">like</span><span class="o">(</span><span class="s">&quot;%MySQL%&quot;</span><span class="o">)).</span><span class="na">count</span><span class="o">();</span>
<span class="c1">// Fetches the MySQL errors as an array of strings</span>
-<span class="n">errors</span><span class="o">.</span><span class="na">filter</span><span class="o">(</span><span class="n">col</span><span class="o">(</span><span class="s">&quot;line&quot;</span><span class="o">).</span><span class="na">like</span><span class="o">(</span><span class="s">&quot;%MySQL%&quot;</span><span class="o">)).</span><span class="na">collect</span><span class="o">();</span></code></pre></div>
+<span class="n">errors</span><span class="o">.</span><span class="na">filter</span><span class="o">(</span><span class="n">col</span><span class="o">(</span><span class="s">&quot;line&quot;</span><span class="o">).</span><span class="na">like</span><span class="o">(</span><span class="s">&quot;%MySQL%&quot;</span><span class="o">)).</span><span class="na">collect</span><span class="o">();</span></code></pre></figure>
</div>
</div>
@@ -412,7 +412,7 @@ A simple MySQL table "people" is used in the example and this table has two colu
<div class="tab-pane tab-pane-python active">
<div class="code code-tab">
-<div class="highlight"><pre><code class="language-python" data-lang="python"><span class="c"># Creates a DataFrame based on a table named &quot;people&quot;</span>
+<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="c"># Creates a DataFrame based on a table named &quot;people&quot;</span>
<span class="c"># stored in a MySQL database.</span>
<span class="n">url</span> <span class="o">=</span> \
<span class="s">&quot;jdbc:mysql://yourIP:yourPort/test?user=yourUsername;password=yourPassword&quot;</span>
@@ -431,7 +431,7 @@ A simple MySQL table "people" is used in the example and this table has two colu
<span class="n">countsByAge</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
<span class="c"># Saves countsByAge to S3 in the JSON format.</span>
-<span class="n">countsByAge</span><span class="o">.</span><span class="n">write</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s">&quot;json&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="s">&quot;s3a://...&quot;</span><span class="p">)</span></code></pre></div>
+<span class="n">countsByAge</span><span class="o">.</span><span class="n">write</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s">&quot;json&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="s">&quot;s3a://...&quot;</span><span class="p">)</span></code></pre></figure>
</div>
</div>
@@ -439,7 +439,7 @@ A simple MySQL table "people" is used in the example and this table has two colu
<div class="tab-pane tab-pane-scala">
<div class="code code-tab">
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="c1">// Creates a DataFrame based on a table named &quot;people&quot;</span>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="c1">// Creates a DataFrame based on a table named &quot;people&quot;</span>
<span class="c1">// stored in a MySQL database.</span>
<span class="k">val</span> <span class="n">url</span> <span class="k">=</span>
<span class="s">&quot;jdbc:mysql://yourIP:yourPort/test?user=yourUsername;password=yourPassword&quot;</span>
@@ -458,7 +458,7 @@ A simple MySQL table "people" is used in the example and this table has two colu
<span class="n">countsByAge</span><span class="o">.</span><span class="n">show</span><span class="o">()</span>
<span class="c1">// Saves countsByAge to S3 in the JSON format.</span>
-<span class="n">countsByAge</span><span class="o">.</span><span class="n">write</span><span class="o">.</span><span class="n">format</span><span class="o">(</span><span class="s">&quot;json&quot;</span><span class="o">).</span><span class="n">save</span><span class="o">(</span><span class="s">&quot;s3a://...&quot;</span><span class="o">)</span></code></pre></div>
+<span class="n">countsByAge</span><span class="o">.</span><span class="n">write</span><span class="o">.</span><span class="n">format</span><span class="o">(</span><span class="s">&quot;json&quot;</span><span class="o">).</span><span class="n">save</span><span class="o">(</span><span class="s">&quot;s3a://...&quot;</span><span class="o">)</span></code></pre></figure>
</div>
</div>
@@ -466,7 +466,7 @@ A simple MySQL table "people" is used in the example and this table has two colu
<div class="tab-pane tab-pane-java">
<div class="code code-tab">
-<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="c1">// Creates a DataFrame based on a table named &quot;people&quot;</span>
+<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="c1">// Creates a DataFrame based on a table named &quot;people&quot;</span>
<span class="c1">// stored in a MySQL database.</span>
<span class="n">String</span> <span class="n">url</span> <span class="o">=</span>
<span class="s">&quot;jdbc:mysql://yourIP:yourPort/test?user=yourUsername;password=yourPassword&quot;</span><span class="o">;</span>
@@ -485,7 +485,7 @@ A simple MySQL table "people" is used in the example and this table has two colu
<span class="n">countsByAge</span><span class="o">.</span><span class="na">show</span><span class="o">();</span>
<span class="c1">// Saves countsByAge to S3 in the JSON format.</span>
-<span class="n">countsByAge</span><span class="o">.</span><span class="na">write</span><span class="o">().</span><span class="na">format</span><span class="o">(</span><span class="s">&quot;json&quot;</span><span class="o">).</span><span class="na">save</span><span class="o">(</span><span class="s">&quot;s3a://...&quot;</span><span class="o">);</span></code></pre></div>
+<span class="n">countsByAge</span><span class="o">.</span><span class="na">write</span><span class="o">().</span><span class="na">format</span><span class="o">(</span><span class="s">&quot;json&quot;</span><span class="o">).</span><span class="na">save</span><span class="o">(</span><span class="s">&quot;s3a://...&quot;</span><span class="o">);</span></code></pre></figure>
</div>
</div>
@@ -516,7 +516,7 @@ We learn to predict the labels from feature vectors using the Logistic Regressio
<div class="tab-pane tab-pane-python active">
<div class="code code-tab">
-<div class="highlight"><pre><code class="language-python" data-lang="python"><span class="c"># Every record of this DataFrame contains the label and</span>
+<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="c"># Every record of this DataFrame contains the label and</span>
<span class="c"># features represented by a vector.</span>
<span class="n">df</span> <span class="o">=</span> <span class="n">sqlContext</span><span class="o">.</span><span class="n">createDataFrame</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="p">[</span><span class="s">&quot;label&quot;</span><span class="p">,</span> <span class="s">&quot;features&quot;</span><span class="p">])</span>
@@ -528,7 +528,7 @@ We learn to predict the labels from feature vectors using the Logistic Regressio
<span class="n">model</span> <span class="o">=</span> <span class="n">lr</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">df</span><span class="p">)</span>
<span class="c"># Given a dataset, predict each point&#39;s label, and show the results.</span>
-<span class="n">model</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">df</span><span class="p">)</span><span class="o">.</span><span class="n">show</span><span class="p">()</span></code></pre></div>
+<span class="n">model</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">df</span><span class="p">)</span><span class="o">.</span><span class="n">show</span><span class="p">()</span></code></pre></figure>
</div>
</div>
@@ -536,7 +536,7 @@ We learn to predict the labels from feature vectors using the Logistic Regressio
<div class="tab-pane tab-pane-scala">
<div class="code code-tab">
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="c1">// Every record of this DataFrame contains the label and</span>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="c1">// Every record of this DataFrame contains the label and</span>
<span class="c1">// features represented by a vector.</span>
<span class="k">val</span> <span class="n">df</span> <span class="k">=</span> <span class="n">sqlContext</span><span class="o">.</span><span class="n">createDataFrame</span><span class="o">(</span><span class="n">data</span><span class="o">).</span><span class="n">toDF</span><span class="o">(</span><span class="s">&quot;label&quot;</span><span class="o">,</span> <span class="s">&quot;features&quot;</span><span class="o">)</span>
@@ -551,7 +551,7 @@ We learn to predict the labels from feature vectors using the Logistic Regressio
<span class="k">val</span> <span class="n">weights</span> <span class="k">=</span> <span class="n">model</span><span class="o">.</span><span class="n">weights</span>
<span class="c1">// Given a dataset, predict each point&#39;s label, and show the results.</span>
-<span class="n">model</span><span class="o">.</span><span class="n">transform</span><span class="o">(</span><span class="n">df</span><span class="o">).</span><span class="n">show</span><span class="o">()</span></code></pre></div>
+<span class="n">model</span><span class="o">.</span><span class="n">transform</span><span class="o">(</span><span class="n">df</span><span class="o">).</span><span class="n">show</span><span class="o">()</span></code></pre></figure>
</div>
</div>
@@ -559,7 +559,7 @@ We learn to predict the labels from feature vectors using the Logistic Regressio
<div class="tab-pane tab-pane-java">
<div class="code code-tab">
-<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="c1">// Every record of this DataFrame contains the label and</span>
+<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="c1">// Every record of this DataFrame contains the label and</span>
<span class="c1">// features represented by a vector.</span>
<span class="n">StructType</span> <span class="n">schema</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">StructType</span><span class="o">(</span><span class="k">new</span> <span class="n">StructField</span><span class="o">[]{</span>
<span class="k">new</span> <span class="nf">StructField</span><span class="o">(</span><span class="s">&quot;label&quot;</span><span class="o">,</span> <span class="n">DataTypes</span><span class="o">.</span><span class="na">DoubleType</span><span class="o">,</span> <span class="kc">false</span><span class="o">,</span> <span class="n">Metadata</span><span class="o">.</span><span class="na">empty</span><span class="o">()),</span>
@@ -578,7 +578,7 @@ We learn to predict the labels from feature vectors using the Logistic Regressio
<span class="n">Vector</span> <span class="n">weights</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="na">weights</span><span class="o">();</span>
<span class="c1">// Given a dataset, predict each point&#39;s label, and show the results.</span>
-<span class="n">model</span><span class="o">.</span><span class="na">transform</span><span class="o">(</span><span class="n">df</span><span class="o">).</span><span class="na">show</span><span class="o">();</span></code></pre></div>
+<span class="n">model</span><span class="o">.</span><span class="na">transform</span><span class="o">(</span><span class="n">df</span><span class="o">).</span><span class="na">show</span><span class="o">();</span></code></pre></figure>
</div>
</div>
diff --git a/site/news/index.html b/site/news/index.html
index 0a099ed9d..a7eef4765 100644
--- a/site/news/index.html
+++ b/site/news/index.html
@@ -191,7 +191,6 @@
<div class="entry-date">July 26, 2016</div>
</header>
<div class="entry-content"><p>We are happy to announce the availability of <a href="/releases/spark-release-2-0-0.html" title="Spark Release 2.0.0">Spark 2.0.0</a>! Visit the <a href="/releases/spark-release-2-0-0.html" title="Spark Release 2.0.0">release notes</a> to read about the new features, or <a href="/downloads.html">download</a> the release today.</p>
-
</div>
</article>
@@ -211,7 +210,6 @@
<div class="entry-date">June 16, 2016</div>
</header>
<div class="entry-content"><p>Call for presentations is now open for <a href="https://spark-summit.org/eu-2016/">Spark Summit EU</a>! The event will take place on October 25-27 in Brussels. Submissions are welcome across a variety of Spark-related topics, including applications, development, data science, enterprise, spark ecosystem and research. Please submit by July 1 to be considered.</p>
-
</div>
</article>
@@ -231,7 +229,6 @@
<div class="entry-date">April 17, 2016</div>
</header>
<div class="entry-content"><p>The agenda for <a href="https://spark-summit.org/2016/">Spark Summit 2016</a> is now available! The summit kicks off on June 6th with a full day of Spark training followed by over 90+ talks featuring speakers from Airbnb, Baidu, Bloomberg, Databricks, Duke, IBM, Microsoft, Netflix, Uber, UC Berkeley. Check out the full <a href="https://spark-summit.org/2016/schedule/">schedule</a> and <a href="https://spark-summit.org/2016/register/">register</a> to attend!</p>
-
</div>
</article>
@@ -251,7 +248,6 @@
<div class="entry-date">February 11, 2016</div>
</header>
<div class="entry-content"><p>Call for presentations is now open for <a href="https://spark-summit.org/2016/">Spark Summit San Francisco</a>! The event will take place on June 6-8 in San Francisco. Submissions are welcome across a variety of Spark-related topics, including applications, development, data science, business value, spark ecosystem and research. Please submit by February 29th to be considered.</p>
-
</div>
</article>
@@ -261,7 +257,6 @@
<div class="entry-date">January 14, 2016</div>
</header>
<div class="entry-content"><p>The <a href="https://spark-summit.org/east-2016/schedule/">agenda for Spark Summit East</a> is now posted, with 60 talks from organizations including Netflix, Comcast, Blackrock, Bloomberg and others. The 2nd annual Spark Summit East will run February 16-18th in NYC and feature a full program of speakers along with Spark training opportunities. More details are available on the <a href="https://spark-summit.org/east-2016/schedule/">Spark Summit East website</a>, where you can also <a href="http://www.prevalentdesignevents.com/sparksummit2016/east/registration.aspx?source=header">register to attend</a>.</p>
-
</div>
</article>
@@ -284,7 +279,6 @@ With this release the Spark community continues to grow, with contributions from
<div class="entry-date">November 19, 2015</div>
</header>
<div class="entry-content"><p>Call for presentations is closing soon for <a href="https://spark-summit.org/east-2016/">Spark Summit East</a>! The event will take place on February 16th-18th in New York City. Submissions are welcome across a variety of Spark-related topics, including applications, development, data science, enterprise, and research. Please submit by November 22nd to be considered.</p>
-
</div>
</article>
@@ -304,7 +298,6 @@ With this release the Spark community continues to grow, with contributions from
<div class="entry-date">October 14, 2015</div>
</header>
<div class="entry-content"><p>Abstract submissions are now open for the 2nd <a href="https://spark-summit.org/east-2016/">Spark Summit East</a>! The event will take place on February 16th-18th in New York City. Submissions are welcome across a variety of Spark-related topics, including applications, development, data science, enterprise, and research.</p>
-
</div>
</article>
@@ -334,7 +327,6 @@ With this release the Spark community continues to grow, with contributions from
<div class="entry-date">September 7, 2015</div>
</header>
<div class="entry-content"><p>The <a href="http://spark-summit.org/eu-2015/schedule">agenda for Spark Summit Europe</a> is now posted, with 38 talks from organizations including Barclays, Netflix, Elsevier, Intel and others. This inaugural Spark conference in Europe will run October 27th-29th 2015 in Amsterdam and feature a full program of speakers along with Spark training opportunities. More details are available on the <a href="https://spark-summit.org/eu-2015/">Spark Summit Europe website</a>, where you can also <a href="https://www.prevalentdesignevents.com/sparksummit2015/europe/registration.aspx?source=header">register</a> to attend.</p>
-
</div>
</article>
@@ -354,7 +346,6 @@ With this release the Spark community continues to grow, with contributions from
<div class="entry-date">June 29, 2015</div>
</header>
<div class="entry-content"><p>The videos and slides for Spark Summit 2015 are now all <a href="http://spark-summit.org/2015/#day-1">available online</a>! The talks include technical roadmap discussions, deep dives on Spark components, and use cases built on top of Spark.</p>
-
</div>
</article>
@@ -371,7 +362,7 @@ With this release the Spark community continues to grow, with contributions from
<article class="hentry">
<header class="entry-header">
<h3 class="entry-title"><a href="/news/one-month-to-spark-summit-2015.html">One month to Spark Summit 2015 in San Francisco</a></h3>
- <div class="entry-date">May 14, 2015</div>
+ <div class="entry-date">May 15, 2015</div>
</header>
<div class="entry-content"><p>There is one month left until <a href="https://spark-summit.org/2015/">Spark Summit 2015</a>, which
will be held in San Francisco on June 15th to 17th.
@@ -383,10 +374,9 @@ The Summit will contain <a href="https://spark-summit.org/2015/schedule/">presen
<article class="hentry">
<header class="entry-header">
<h3 class="entry-title"><a href="/news/spark-summit-europe.html">Announcing Spark Summit Europe</a></h3>
- <div class="entry-date">May 14, 2015</div>
+ <div class="entry-date">May 15, 2015</div>
</header>
<div class="entry-content"><p>Abstract submissions are now open for the first ever <a href="https://www.prevalentdesignevents.com/sparksummit2015/europe/speaker/">Spark Summit Europe</a>. The event will take place on October 27th to 29th in Amsterdam. Submissions are welcome across a variety of Spark related topics, including use cases and ongoing development.</p>
-
</div>
</article>
@@ -395,7 +385,7 @@ The Summit will contain <a href="https://spark-summit.org/2015/schedule/">presen
<h3 class="entry-title"><a href="/news/spark-summit-east-2015-videos-posted.html">Spark Summit East 2015 Videos Posted</a></h3>
<div class="entry-date">April 20, 2015</div>
</header>
- <div class="entry-content"><p>The videos and slides for Spark Summit East 2015 are now all <a href="http://spark-summit.org/east/2015">available online</a>. Watch them to get the latest news from the Spark community as well as use cases and applications built on top. </p>
+ <div class="entry-content"><p>The videos and slides for Spark Summit East 2015 are now all <a href="http://spark-summit.org/east/2015">available online</a>. Watch them to get the latest news from the Spark community as well as use cases and applications built on top.</p>
</div>
</article>
@@ -405,7 +395,7 @@ The Summit will contain <a href="https://spark-summit.org/2015/schedule/">presen
<h3 class="entry-title"><a href="/news/spark-1-2-2-released.html">Spark 1.2.2 and 1.3.1 released</a></h3>
<div class="entry-date">April 17, 2015</div>
</header>
- <div class="entry-content"><p>We are happy to announce the availability of <a href="/releases/spark-release-1-2-2.html" title="Spark Release 1.2.2">Spark 1.2.2</a> and <a href="/releases/spark-release-1-3-1.html" title="Spark Release 1.3.1">Spark 1.3.1</a>! These are both maintenance releases that collectively feature the work of more than 90 developers. </p>
+ <div class="entry-content"><p>We are happy to announce the availability of <a href="/releases/spark-release-1-2-2.html" title="Spark Release 1.2.2">Spark 1.2.2</a> and <a href="/releases/spark-release-1-3-1.html" title="Spark Release 1.3.1">Spark 1.3.1</a>! These are both maintenance releases that collectively feature the work of more than 90 developers.</p>
</div>
</article>
@@ -517,7 +507,7 @@ The Summit will contain <a href="https://spark-summit.org/2015/schedule/">presen
</header>
<div class="entry-content"><p>We are happy to announce the availability of <a href="/releases/spark-release-0-9-2.html" title="Spark Release 0.9.2">
Spark 0.9.2</a>! Apache Spark 0.9.2 is a maintenance release with bug fixes. We recommend all 0.9.x users to upgrade to this stable release.
-Contributions to this release came from 28 developers. </p>
+Contributions to this release came from 28 developers.</p>
</div>
</article>
@@ -588,7 +578,7 @@ about the latest happenings in Spark.</p>
<div class="entry-content"><p>We are happy to announce the availability of <a href="/releases/spark-release-0-9-1.html" title="Spark Release 0.9.1">
Spark 0.9.1</a>! Apache Spark 0.9.1 is a maintenance release with bug fixes, performance improvements, better stability with YARN and
improved parity of the Scala and Python API. We recommend all 0.9.0 users to upgrade to this stable release.
-Contributions to this release came from 37 developers. </p>
+Contributions to this release came from 37 developers.</p>
</div>
</article>
@@ -636,7 +626,6 @@ hardened YARN support.</p>
<div class="entry-date">December 19, 2013</div>
</header>
<div class="entry-content"><p>We&#8217;ve just posted <a href="/releases/spark-release-0-8-1.html" title="Spark Release 0.8.1">Spark Release 0.8.1</a>, a maintenance and performance release for the Scala 2.9 version of Spark. 0.8.1 includes support for YARN 2.2, a high availability mode for the standalone scheduler, optimizations to the shuffle, and many other improvements. We recommend that all users update to this release. Visit the <a href="/releases/spark-release-0-8-1.html" title="Spark Release 0.8.1">release notes</a> to read about the new features, or <a href="/downloads.html">download</a> the release today.</p>
-
</div>
</article>
@@ -667,7 +656,6 @@ Over 450 Spark developers and enthusiasts from 13 countries and more than 180 co
<div class="entry-date">September 25, 2013</div>
</header>
<div class="entry-content"><p>We&#8217;re proud to announce the release of <a href="/releases/spark-release-0-8-0.html" title="Spark Release 0.8.0">Apache Spark 0.8.0</a>. Spark 0.8.0 is a major release that includes many new capabilities and usability improvements. It’s also our first release under the Apache incubator. It is the largest Spark release yet, with contributions from 67 developers and 24 companies. Major new features include an expanded monitoring framework and UI, a machine learning library, and support for running Spark inside of YARN.</p>
-
</div>
</article>
@@ -697,7 +685,6 @@ Over 450 Spark developers and enthusiasts from 13 countries and more than 180 co
<div class="entry-date">July 23, 2013</div>
</header>
<div class="entry-content"><p>Want to learn how to use Spark, Shark, GraphX, and related technologies in person? The AMP Lab is hosting a two-day training workshop for them on August 29th and 30th in Berkeley. The workshop will include tutorials, talks from users, and over four hours of hands-on exercises. <a href="http://ampcamp.berkeley.edu/amp-camp-three-berkeley-2013/">Registration is now open on the AMP Camp website</a>, for a price of $250 per person. We recommend signing up early because last year&#8217;s workshop was sold out.</p>
-
</div>
</article>
@@ -718,7 +705,6 @@ Over 450 Spark developers and enthusiasts from 13 countries and more than 180 co
</ul>
<p>Most users will probably want the User list, but individuals interested in contributing code to the project should also subscribe to the Dev list.</p>
-
</div>
</article>
@@ -728,7 +714,6 @@ Over 450 Spark developers and enthusiasts from 13 countries and more than 180 co
<div class="entry-date">July 16, 2013</div>
</header>
<div class="entry-content"><p>We&#8217;ve just posted <a href="/releases/spark-release-0-7-3.html" title="Spark Release 0.7.3">Spark Release 0.7.3</a>, a maintenance release that contains several fixes, including streaming API updates and new functionality for adding JARs to a <code>spark-shell</code> session. We recommend that all users update to this release. Visit the <a href="/releases/spark-release-0-7-3.html" title="Spark Release 0.7.3">release notes</a> to read about the new features, or <a href="/downloads.html">download</a> the release today.</p>
-
</div>
</article>
@@ -738,7 +723,6 @@ Over 450 Spark developers and enthusiasts from 13 countries and more than 180 co
<div class="entry-date">June 21, 2013</div>
</header>
<div class="entry-content"><p>Spark, its creators at the AMP Lab, and some of its users were featured in a <a href="http://www.wired.com/wiredenterprise/2013/06/yahoo-amazon-amplab-spark/all/">Wired Enterprise article</a> a few days ago. Read on to learn a little about how Spark is being used in industry.</p>
-
</div>
</article>
@@ -748,7 +732,6 @@ Over 450 Spark developers and enthusiasts from 13 countries and more than 180 co
<div class="entry-date">June 21, 2013</div>
</header>
<div class="entry-content"><p>Spark was recently <a href="http://mail-archives.apache.org/mod_mbox/incubator-general/201306.mbox/%3CCDE7B773.E9A48%25chris.a.mattmann%40jpl.nasa.gov%3E">accepted</a> into the <a href="http://incubator.apache.org">Apache Incubator</a>, which will serve as the long-term home for the project. While moving the source code and issue tracking to Apache will take some time, we are excited to be joining the community at Apache. Stay tuned on this site for updates on how the project hosting will change.</p>
-
</div>
</article>
@@ -758,7 +741,6 @@ Over 450 Spark developers and enthusiasts from 13 countries and more than 180 co
<div class="entry-date">June 2, 2013</div>
</header>
<div class="entry-content"><p>We&#8217;re happy to announce the release of <a href="/releases/spark-release-0-7-2.html" title="Spark Release 0.7.2">Spark 0.7.2</a>, a new maintenance release that includes several bug fixes and improvements, as well as new code examples and API features. We recommend that all users update to this release. Head over to the <a href="/releases/spark-release-0-7-2.html" title="Spark Release 0.7.2">release notes</a> to read about the new features, or <a href="/downloads.html">download</a> the release today.</p>
-
</div>
</article>
@@ -774,7 +756,6 @@ Over 450 Spark developers and enthusiasts from 13 countries and more than 180 co
<p>The second screencast is a 2 minute <a href="/screencasts/2-spark-documentation-overview.html">overview of the Spark documentation</a>.</p>
<p>We hope you find these screencasts useful.</p>
-
</div>
</article>
@@ -784,7 +765,6 @@ Over 450 Spark developers and enthusiasts from 13 countries and more than 180 co
<div class="entry-date">March 17, 2013</div>
</header>
<div class="entry-content"><p>At this year&#8217;s <a href="http://strataconf.com/strata2013">Strata</a> conference, the AMP Lab hosted a full day of tutorials on Spark, Shark, and Spark Streaming, including online exercises on Amazon EC2. Those exercises are now <a href="http://ampcamp.berkeley.edu/big-data-mini-course/">available online</a>, letting you learn Spark and Shark at your own pace on an EC2 cluster with real data. They are a great resource for learning the systems. You can also find <a href="http://ampcamp.berkeley.edu/amp-camp-two-strata-2013/">slides</a> from the Strata tutorials online, as well as <a href="http://ampcamp.berkeley.edu/amp-camp-one-berkeley-2012/">videos</a> from the AMP Camp workshop we held at Berkeley in August.</p>
-
</div>
</article>
@@ -794,7 +774,6 @@ Over 450 Spark developers and enthusiasts from 13 countries and more than 180 co
<div class="entry-date">February 27, 2013</div>
</header>
<div class="entry-content"><p>We&#8217;re proud to announce the release of <a href="/releases/spark-release-0-7-0.html" title="Spark Release 0.7.0">Spark 0.7.0</a>, a new major version of Spark that adds several key features, including a <a href="/docs/latest/python-programming-guide.html">Python API</a> for Spark and an <a href="/docs/latest/streaming-programming-guide.html">alpha of Spark Streaming</a>. This release is the result of the largest group of contributors yet behind a Spark release &#8211; 31 contributors from inside and outside Berkeley. Head over to the <a href="/releases/spark-release-0-7-0.html" title="Spark Release 0.7.0">release notes</a> to read more about the new features, or <a href="/downloads.html">download</a> the release today.</p>
-
</div>
</article>
@@ -804,7 +783,6 @@ Over 450 Spark developers and enthusiasts from 13 countries and more than 180 co
<div class="entry-date">February 24, 2013</div>
</header>
<div class="entry-content"><p>This weekend, Amazon posted an <a href="http://aws.amazon.com/articles/Elastic-MapReduce/4926593393724923">article</a> and code that make it easy to launch Spark and Shark on Elastic MapReduce. The article includes examples of how to run both interactive Scala commands and SQL queries from Shark on data in S3. Head over to the <a href="http://aws.amazon.com/articles/Elastic-MapReduce/4926593393724923">Amazon article</a> for details. We&#8217;re very excited because, to our knowledge, this makes Spark the first non-Hadoop engine that you can launch with EMR.</p>
-
</div>
</article>
@@ -814,7 +792,6 @@ Over 450 Spark developers and enthusiasts from 13 countries and more than 180 co
<div class="entry-date">February 7, 2013</div>
</header>
<div class="entry-content"><p>We recently released <a href="/releases/spark-release-0-6-2.html" title="Spark Release 0.6.2">Spark 0.6.2</a>, a new version of Spark. This is a maintenance release that includes several bug fixes and usability improvements (see the <a href="/releases/spark-release-0-6-2.html" title="Spark Release 0.6.2">release notes</a>). We recommend that all users upgrade to this release.</p>
-
</div>
</article>
@@ -829,7 +806,6 @@ Over 450 Spark developers and enthusiasts from 13 countries and more than 180 co
<li><a href="http://blog.quantifind.com/posts/logging-post/">Configuring Spark's logs</a></li>
</ul>
<p>Thanks for sharing this, and looking forward to see others!</p>
-
</div>
</article>
@@ -839,7 +815,6 @@ Over 450 Spark developers and enthusiasts from 13 countries and more than 180 co
<div class="entry-date">December 21, 2012</div>
</header>
<div class="entry-content"><p>On December 18th, we held the first of a series of Spark development meetups, for people interested in learning the Spark codebase and contributing to the project. There was quite a bit more demand than we anticipated, with over 80 people signing up and 64 attending. The first meetup was an <a href="http://www.meetup.com/spark-users/events/94101942/">introduction to Spark internals</a>. Thanks to one of the attendees, there&#8217;s now a <a href="http://www.youtube.com/watch?v=49Hr5xZyTEA">video of the meetup</a> on YouTube. We&#8217;ve also posted the <a href="http://files.meetup.com/3138542/dev-meetup-dec-2012.pptx">slides</a>. Look to see more development meetups on Spark and Shark in the future.</p>
-
</div>
</article>
@@ -858,8 +833,7 @@ Over 450 Spark developers and enthusiasts from 13 countries and more than 180 co
<li><a href="http://data-informed.com/spark-an-open-source-engine-for-iterative-data-mining/">DataInformed</a> interviewed two Spark users and wrote about their applications in anomaly detection, predictive analytics and data mining.</li>
</ul>
-<p>In other news, there will be a full day of tutorials on Spark and Shark at the <a href="http://strataconf.com/strata2013">O&#8217;Reilly Strata conference</a> in February. They include a three-hour <a href="http://strataconf.com/strata2013/public/schedule/detail/27438">introduction to Spark, Shark and BDAS</a> Tuesday morning, and a three-hour <a href="http://strataconf.com/strata2013/public/schedule/detail/27440">hands-on exercise session</a>. </p>
-
+<p>In other news, there will be a full day of tutorials on Spark and Shark at the <a href="http://strataconf.com/strata2013">O&#8217;Reilly Strata conference</a> in February. They include a three-hour <a href="http://strataconf.com/strata2013/public/schedule/detail/27438">introduction to Spark, Shark and BDAS</a> Tuesday morning, and a three-hour <a href="http://strataconf.com/strata2013/public/schedule/detail/27440">hands-on exercise session</a>.</p>
</div>
</article>
@@ -869,7 +843,6 @@ Over 450 Spark developers and enthusiasts from 13 countries and more than 180 co
<div class="entry-date">November 22, 2012</div>
</header>
<div class="entry-content"><p>Today we&#8217;ve made available two maintenance releases for Spark: <a href="/releases/spark-release-0-6-1.html" title="Spark Release 0.6.1">0.6.1</a> and <a href="/releases/spark-release-0-5-2.html" title="Spark Release 0.5.2">0.5.2</a>. They both contain important bug fixes as well as some new features, such as the ability to build against Hadoop 2 distributions. We recommend that users update to the latest version for their branch; for new users, we recommend <a href="/releases/spark-release-0-6-1.html" title="Spark Release 0.6.1">0.6.1</a>.</p>
-
</div>
</article>
@@ -879,7 +852,6 @@ Over 450 Spark developers and enthusiasts from 13 countries and more than 180 co
<div class="entry-date">October 15, 2012</div>
</header>
<div class="entry-content"><p><a href="/releases/spark-release-0-6-0.html">Spark version 0.6.0</a> was released today, a major release that brings a wide range of performance improvements and new features, including a simpler standalone deploy mode and a Java API. Read more about it in the <a href="/releases/spark-release-0-6-0.html">release notes</a>.</p>
-
</div>
</article>
@@ -889,7 +861,6 @@ Over 450 Spark developers and enthusiasts from 13 countries and more than 180 co
<div class="entry-date">April 25, 2012</div>
</header>
<div class="entry-content"><p>Our <a href="http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf">paper on Spark</a> won the Best Paper Award at the <a href="http://www.usenix.org/nsdi12/">USENIX NSDI conference</a>. You can see a video of the talk, as well as slides, online on the <a href="https://www.usenix.org/conference/nsdi12/resilient-distributed-datasets-fault-tolerant-abstraction-memory-cluster-computing">NSDI website</a>.</p>
-
</div>
</article>
@@ -899,7 +870,6 @@ Over 450 Spark developers and enthusiasts from 13 countries and more than 180 co
<div class="entry-date">January 10, 2012</div>
</header>
<div class="entry-content"><p>We&#8217;ve started hosting a regular <a href="http://www.meetup.com/spark-users/">Bay Area Spark User Meetup</a>. Sign up on the meetup.com page to be notified about events and meet other Spark developers and users.</p>
-
</div>
</article>
diff --git a/site/news/spark-0-9-1-released.html b/site/news/spark-0-9-1-released.html
index a803728e7..25e289763 100644
--- a/site/news/spark-0-9-1-released.html
+++ b/site/news/spark-0-9-1-released.html
@@ -189,7 +189,7 @@
<p>We are happy to announce the availability of <a href="/releases/spark-release-0-9-1.html" title="Spark Release 0.9.1">
Spark 0.9.1</a>! Apache Spark 0.9.1 is a maintenance release with bug fixes, performance improvements, better stability with YARN and
improved parity of the Scala and Python API. We recommend all 0.9.0 users to upgrade to this stable release.
-Contributions to this release came from 37 developers. </p>
+Contributions to this release came from 37 developers.</p>
<p>Visit the <a href="/releases/spark-release-0-9-1.html" title="Spark Release 0.9.1">release notes</a>
to read about the new features, or <a href="/downloads.html">download</a> the release today.</p>
diff --git a/site/news/spark-0-9-2-released.html b/site/news/spark-0-9-2-released.html
index 748344169..1c90c73e0 100644
--- a/site/news/spark-0-9-2-released.html
+++ b/site/news/spark-0-9-2-released.html
@@ -188,7 +188,7 @@
<p>We are happy to announce the availability of <a href="/releases/spark-release-0-9-2.html" title="Spark Release 0.9.2">
Spark 0.9.2</a>! Apache Spark 0.9.2 is a maintenance release with bug fixes. We recommend all 0.9.x users to upgrade to this stable release.
-Contributions to this release came from 28 developers. </p>
+Contributions to this release came from 28 developers.</p>
<p>Visit the <a href="/releases/spark-release-0-9-2.html" title="Spark Release 0.9.2">release notes</a>
to read about the new features, or <a href="/downloads.html">download</a> the release today.</p>
diff --git a/site/news/spark-1-1-0-released.html b/site/news/spark-1-1-0-released.html
index 5733ef26e..793a1052c 100644
--- a/site/news/spark-1-1-0-released.html
+++ b/site/news/spark-1-1-0-released.html
@@ -188,7 +188,7 @@
<p>We are happy to announce the availability of <a href="/releases/spark-release-1-1-0.html" title="Spark Release 1.1.0">Spark 1.1.0</a>! Spark 1.1.0 is the second release on the API-compatible 1.X line. It is Spark&#8217;s largest release ever, with contributions from 171 developers!</p>
-<p>This release brings operational and performance improvements in Spark core including a new implementation of the Spark shuffle designed for very large scale workloads. Spark 1.1 adds significant extensions to the newest Spark modules, MLlib and Spark SQL. Spark SQL introduces a JDBC server, byte code generation for fast expression evaluation, a public types API, JSON support, and other features and optimizations. MLlib introduces a new statistics libary along with several new algorithms and optimizations. Spark 1.1 also builds out Spark’s Python support and adds new components to the Spark Streaming module. </p>
+<p>This release brings operational and performance improvements in Spark core including a new implementation of the Spark shuffle designed for very large scale workloads. Spark 1.1 adds significant extensions to the newest Spark modules, MLlib and Spark SQL. Spark SQL introduces a JDBC server, byte code generation for fast expression evaluation, a public types API, JSON support, and other features and optimizations. MLlib introduces a new statistics libary along with several new algorithms and optimizations. Spark 1.1 also builds out Spark’s Python support and adds new components to the Spark Streaming module.</p>
<p>Visit the <a href="/releases/spark-release-1-1-0.html" title="Spark Release 1.1.0">release notes</a> to read about the new features, or <a href="/downloads.html">download</a> the release today.</p>
diff --git a/site/news/spark-1-2-2-released.html b/site/news/spark-1-2-2-released.html
index edd221378..99f15c23c 100644
--- a/site/news/spark-1-2-2-released.html
+++ b/site/news/spark-1-2-2-released.html
@@ -186,7 +186,7 @@
<h2>Spark 1.2.2 and 1.3.1 released</h2>
-<p>We are happy to announce the availability of <a href="/releases/spark-release-1-2-2.html" title="Spark Release 1.2.2">Spark 1.2.2</a> and <a href="/releases/spark-release-1-3-1.html" title="Spark Release 1.3.1">Spark 1.3.1</a>! These are both maintenance releases that collectively feature the work of more than 90 developers. </p>
+<p>We are happy to announce the availability of <a href="/releases/spark-release-1-2-2.html" title="Spark Release 1.2.2">Spark 1.2.2</a> and <a href="/releases/spark-release-1-3-1.html" title="Spark Release 1.3.1">Spark 1.3.1</a>! These are both maintenance releases that collectively feature the work of more than 90 developers.</p>
<p>To download either release, visit the <a href="/downloads.html">downloads</a> page.</p>
diff --git a/site/news/spark-and-shark-in-the-news.html b/site/news/spark-and-shark-in-the-news.html
index 74869ebd1..23ae1e253 100644
--- a/site/news/spark-and-shark-in-the-news.html
+++ b/site/news/spark-and-shark-in-the-news.html
@@ -196,7 +196,7 @@
<li><a href="http://data-informed.com/spark-an-open-source-engine-for-iterative-data-mining/">DataInformed</a> interviewed two Spark users and wrote about their applications in anomaly detection, predictive analytics and data mining.</li>
</ul>
-<p>In other news, there will be a full day of tutorials on Spark and Shark at the <a href="http://strataconf.com/strata2013">O&#8217;Reilly Strata conference</a> in February. They include a three-hour <a href="http://strataconf.com/strata2013/public/schedule/detail/27438">introduction to Spark, Shark and BDAS</a> Tuesday morning, and a three-hour <a href="http://strataconf.com/strata2013/public/schedule/detail/27440">hands-on exercise session</a>. </p>
+<p>In other news, there will be a full day of tutorials on Spark and Shark at the <a href="http://strataconf.com/strata2013">O&#8217;Reilly Strata conference</a> in February. They include a three-hour <a href="http://strataconf.com/strata2013/public/schedule/detail/27438">introduction to Spark, Shark and BDAS</a> Tuesday morning, and a three-hour <a href="http://strataconf.com/strata2013/public/schedule/detail/27440">hands-on exercise session</a>.</p>
<p>
diff --git a/site/news/spark-summit-east-2015-videos-posted.html b/site/news/spark-summit-east-2015-videos-posted.html
index 4d237eff9..204c13d29 100644
--- a/site/news/spark-summit-east-2015-videos-posted.html
+++ b/site/news/spark-summit-east-2015-videos-posted.html
@@ -186,7 +186,7 @@
<h2>Spark Summit East 2015 Videos Posted</h2>
-<p>The videos and slides for Spark Summit East 2015 are now all <a href="http://spark-summit.org/east/2015">available online</a>. Watch them to get the latest news from the Spark community as well as use cases and applications built on top. </p>
+<p>The videos and slides for Spark Summit East 2015 are now all <a href="http://spark-summit.org/east/2015">available online</a>. Watch them to get the latest news from the Spark community as well as use cases and applications built on top.</p>
<p>If you like what you see, consider joining us at the <a href="http://spark-summit.org/2015/agenda">2015 Spark Summit</a> in San Francisco.</p>
diff --git a/site/releases/spark-release-0-8-0.html b/site/releases/spark-release-0-8-0.html
index 9815cfea9..78fcbcf4c 100644
--- a/site/releases/spark-release-0-8-0.html
+++ b/site/releases/spark-release-0-8-0.html
@@ -210,13 +210,13 @@
<p>Spark’s internal job scheduler has been refactored and extended to include more sophisticated scheduling policies. In particular, a <a href="http://spark.incubator.apache.org/docs/0.8.0/job-scheduling.html#scheduling-within-an-application">fair scheduler</a> implementation now allows multiple users to share an instance of Spark, which helps users running shorter jobs to achieve good performance, even when longer-running jobs are running in parallel. Support for topology-aware scheduling has been extended, including the ability to take into account rack locality and support for multiple executors on a single machine.</p>
<h3 id="easier-deployment-and-linking">Easier Deployment and Linking</h3>
-<p>User programs can now link to Spark no matter which Hadoop version they need, without having to publish a version of <code>spark-core</code> specifically for that Hadoop version. An explanation of how to link against different Hadoop versions is provided <a href="http://spark.incubator.apache.org/docs/0.8.0/scala-programming-guide.html#linking-with-spark">here</a>. </p>
+<p>User programs can now link to Spark no matter which Hadoop version they need, without having to publish a version of <code>spark-core</code> specifically for that Hadoop version. An explanation of how to link against different Hadoop versions is provided <a href="http://spark.incubator.apache.org/docs/0.8.0/scala-programming-guide.html#linking-with-spark">here</a>.</p>
<h3 id="expanded-ec2-capabilities">Expanded EC2 Capabilities</h3>
<p>Spark’s EC2 scripts now support launching in any availability zone. Support has also been added for EC2 instance types which use the newer “HVM” architecture. This includes the cluster compute (cc1/cc2) family of instance types. We’ve also added support for running newer versions of HDFS alongside Spark. Finally, we’ve added the ability to launch clusters with maintenance releases of Spark in addition to launching the newest release.</p>
<h3 id="improved-documentation">Improved Documentation</h3>
-<p>This release adds documentation about cluster hardware provisioning and inter-operation with common Hadoop distributions. Docs are also included to cover the MLlib machine learning functions and new cluster monitoring features. Existing documentation has been updated to reflect changes in building and deploying Spark. </p>
+<p>This release adds documentation about cluster hardware provisioning and inter-operation with common Hadoop distributions. Docs are also included to cover the MLlib machine learning functions and new cluster monitoring features. Existing documentation has been updated to reflect changes in building and deploying Spark.</p>
<h3 id="other-improvements">Other Improvements</h3>
<ul>
diff --git a/site/releases/spark-release-0-9-1.html b/site/releases/spark-release-0-9-1.html
index 386b37200..c53f7cdac 100644
--- a/site/releases/spark-release-0-9-1.html
+++ b/site/releases/spark-release-0-9-1.html
@@ -201,9 +201,9 @@
<li>Fixed hash collision bug in external spilling [<a href="https://issues.apache.org/jira/browse/SPARK-1113">SPARK-1113</a>]</li>
<li>Fixed conflict with Spark’s log4j for users relying on other logging backends [<a href="https://issues.apache.org/jira/browse/SPARK-1190">SPARK-1190</a>]</li>
<li>Fixed Graphx missing from Spark assembly jar in maven builds</li>
- <li>Fixed silent failures due to map output status exceeding Akka frame size [<a href="https://issues.apache.org/jira/browse/SPARK-1244">SPARK-1244</a>] </li>
- <li>Removed Spark’s unnecessary direct dependency on ASM [<a href="https://issues.apache.org/jira/browse/SPARK-782">SPARK-782</a>] </li>
- <li>Removed metrics-ganglia from default build due to LGPL license conflict [<a href="https://issues.apache.org/jira/browse/SPARK-1167">SPARK-1167</a>] </li>
+ <li>Fixed silent failures due to map output status exceeding Akka frame size [<a href="https://issues.apache.org/jira/browse/SPARK-1244">SPARK-1244</a>]</li>
+ <li>Removed Spark’s unnecessary direct dependency on ASM [<a href="https://issues.apache.org/jira/browse/SPARK-782">SPARK-782</a>]</li>
+ <li>Removed metrics-ganglia from default build due to LGPL license conflict [<a href="https://issues.apache.org/jira/browse/SPARK-1167">SPARK-1167</a>]</li>
<li>Fixed bug in distribution tarball not containing spark assembly jar [<a href="https://issues.apache.org/jira/browse/SPARK-1184">SPARK-1184</a>]</li>
<li>Fixed bug causing infinite NullPointerException failures due to a null in map output locations [<a href="https://issues.apache.org/jira/browse/SPARK-1124">SPARK-1124</a>]</li>
<li>Fixed bugs in post-job cleanup of scheduler’s data structures</li>
@@ -219,7 +219,7 @@
<li>Fixed bug making Spark application stall when YARN registration fails [<a href="https://issues.apache.org/jira/browse/SPARK-1032">SPARK-1032</a>]</li>
<li>Race condition in getting HDFS delegation tokens in yarn-client mode [<a href="https://issues.apache.org/jira/browse/SPARK-1203">SPARK-1203</a>]</li>
<li>Fixed bug in yarn-client mode not exiting properly [<a href="https://issues.apache.org/jira/browse/SPARK-1049">SPARK-1049</a>]</li>
- <li>Fixed regression bug in ADD_JAR environment variable not correctly adding custom jars [<a href="https://issues.apache.org/jira/browse/SPARK-1089">SPARK-1089</a>] </li>
+ <li>Fixed regression bug in ADD_JAR environment variable not correctly adding custom jars [<a href="https://issues.apache.org/jira/browse/SPARK-1089">SPARK-1089</a>]</li>
</ul>
<h3 id="improvements-to-other-deployment-scenarios">Improvements to other deployment scenarios</h3>
@@ -230,19 +230,19 @@
<h3 id="optimizations-to-mllib">Optimizations to MLLib</h3>
<ul>
- <li>Optimized memory usage of ALS [<a href="https://issues.apache.org/jira/browse/MLLIB-25">MLLIB-25</a>] </li>
+ <li>Optimized memory usage of ALS [<a href="https://issues.apache.org/jira/browse/MLLIB-25">MLLIB-25</a>]</li>
<li>Optimized computation of YtY for implicit ALS [<a href="https://issues.apache.org/jira/browse/SPARK-1237">SPARK-1237</a>]</li>
<li>Support for negative implicit input in ALS [<a href="https://issues.apache.org/jira/browse/MLLIB-22">MLLIB-22</a>]</li>
<li>Setting of a random seed in ALS [<a href="https://issues.apache.org/jira/browse/SPARK-1238">SPARK-1238</a>]</li>
- <li>Faster construction of features with intercept [<a href="https://issues.apache.org/jira/browse/SPARK-1260">SPARK-1260</a>] </li>
+ <li>Faster construction of features with intercept [<a href="https://issues.apache.org/jira/browse/SPARK-1260">SPARK-1260</a>]</li>
<li>Check for intercept and weight in GLM’s addIntercept [<a href="https://issues.apache.org/jira/browse/SPARK-1327">SPARK-1327</a>]</li>
</ul>
<h3 id="bug-fixes-and-better-api-parity-for-pyspark">Bug fixes and better API parity for PySpark</h3>
<ul>
<li>Fixed bug in Python de-pickling [<a href="https://issues.apache.org/jira/browse/SPARK-1135">SPARK-1135</a>]</li>
- <li>Fixed bug in serialization of strings longer than 64K [<a href="https://issues.apache.org/jira/browse/SPARK-1043">SPARK-1043</a>] </li>
- <li>Fixed bug that made jobs hang when base file is not available [<a href="https://issues.apache.org/jira/browse/SPARK-1025">SPARK-1025</a>] </li>
+ <li>Fixed bug in serialization of strings longer than 64K [<a href="https://issues.apache.org/jira/browse/SPARK-1043">SPARK-1043</a>]</li>
+ <li>Fixed bug that made jobs hang when base file is not available [<a href="https://issues.apache.org/jira/browse/SPARK-1025">SPARK-1025</a>]</li>
<li>Added Missing RDD operations to PySpark - top, zip, foldByKey, repartition, coalesce, getStorageLevel, setName and toDebugString</li>
</ul>
@@ -274,13 +274,13 @@
<li>Kay Ousterhout - Multiple bug fixes in scheduler&#8217;s handling of task failures</li>
<li>Kousuke Saruta - Use of https to access github</li>
<li>Mark Grover - Bug fix in distribution tar.gz</li>
- <li>Matei Zaharia - Bug fixes in handling of task failures due to NPE, and cleaning up of scheduler data structures </li>
+ <li>Matei Zaharia - Bug fixes in handling of task failures due to NPE, and cleaning up of scheduler data structures</li>
<li>Nan Zhu - Bug fixes in PySpark RDD.takeSample and adding of JARs using ADD_JAR - and improvements to docs</li>
<li>Nick Lanham - Added ability to make distribution tarballs with Tachyon</li>
<li>Patrick Wendell - Bug fixes in ASM shading, fixes for log4j initialization, removing Ganglia due to LGPL license, and other miscallenous bug fixes</li>
<li>Prabin Banka - RDD.zip and other missing RDD operations in PySpark</li>
<li>Prashant Sharma - RDD.foldByKey in PySpark, and other PySpark doc improvements</li>
- <li>Qiuzhuang - Bug fix in standalone worker </li>
+ <li>Qiuzhuang - Bug fix in standalone worker</li>
<li>Raymond Liu - Changed working directory in ZookeeperPersistenceEngine</li>
<li>Reynold Xin - Improvements to docs and test infrastructure</li>
<li>Sandy Ryza - Multiple important Yarn bug fixes and improvements</li>
diff --git a/site/releases/spark-release-1-0-1.html b/site/releases/spark-release-1-0-1.html
index 22905b65a..6e568e3a6 100644
--- a/site/releases/spark-release-1-0-1.html
+++ b/site/releases/spark-release-1-0-1.html
@@ -258,8 +258,8 @@
<li>Cheng Hao &#8211; SQL features</li>
<li>Cheng Lian &#8211; SQL features</li>
<li>Christian Tzolov &#8211; build improvmenet</li>
- <li>Clément MATHIEU &#8211; doc updates </li>
- <li>CodingCat &#8211; doc updates and bug fix </li>
+ <li>Clément MATHIEU &#8211; doc updates</li>
+ <li>CodingCat &#8211; doc updates and bug fix</li>
<li>Colin McCabe &#8211; bug fix</li>
<li>Daoyuan &#8211; SQL joins</li>
<li>David Lemieux &#8211; bug fix</li>
@@ -275,7 +275,7 @@
<li>Kan Zhang &#8211; PySpark SQL features</li>
<li>Kay Ousterhout &#8211; documentation fix</li>
<li>LY Lai &#8211; bug fix</li>
- <li>Lars Albertsson &#8211; bug fix </li>
+ <li>Lars Albertsson &#8211; bug fix</li>
<li>Lei Zhang &#8211; SQL fix and feature</li>
<li>Mark Hamstra &#8211; bug fix</li>
<li>Matei Zaharia &#8211; doc updates and bug fix</li>
@@ -297,7 +297,7 @@
<li>Shixiong Zhu &#8211; code clean-up</li>
<li>Szul, Piotr &#8211; bug fix</li>
<li>Takuya UESHIN &#8211; bug fixes and SQL features</li>
- <li>Thomas Graves &#8211; bug fix </li>
+ <li>Thomas Graves &#8211; bug fix</li>
<li>Uri Laserson &#8211; bug fix</li>
<li>Vadim Chekan &#8211; bug fix</li>
<li>Varakhedi Sujeet &#8211; ec2 r3 support</li>
diff --git a/site/releases/spark-release-1-0-2.html b/site/releases/spark-release-1-0-2.html
index ae5916ab1..304446bef 100644
--- a/site/releases/spark-release-1-0-2.html
+++ b/site/releases/spark-release-1-0-2.html
@@ -268,7 +268,7 @@
<li>johnnywalleye - Bug fixes in MLlib</li>
<li>joyyoj - Bug fix in Streaming</li>
<li>kballou - Doc fix</li>
- <li>lianhuiwang - Doc fix </li>
+ <li>lianhuiwang - Doc fix</li>
<li>witgo - Bug fix in sbt</li>
</ul>
diff --git a/site/releases/spark-release-1-1-0.html b/site/releases/spark-release-1-1-0.html
index 7f8812649..2c3bffcdb 100644
--- a/site/releases/spark-release-1-1-0.html
+++ b/site/releases/spark-release-1-1-0.html
@@ -197,7 +197,7 @@
<p>Spark SQL adds a number of new features and performance improvements in this release. A <a href="http://spark.apache.org/docs/1.1.0/sql-programming-guide.html#running-the-thrift-jdbc-server">JDBC/ODBC server</a> allows users to connect to SparkSQL from many different applications and provides shared access to cached tables. A new module provides <a href="http://spark.apache.org/docs/1.1.0/sql-programming-guide.html#json-datasets">support for loading JSON data</a> directly into Spark’s SchemaRDD format, including automatic schema inference. Spark SQL introduces <a href="http://spark.apache.org/docs/1.1.0/sql-programming-guide.html#other-configuration-options">dynamic bytecode generation</a> in this release, a technique which significantly speeds up execution for queries that perform complex expression evaluation. This release also adds support for registering Python, Scala, and Java lambda functions as UDFs, which can then be called directly in SQL. Spark 1.1 adds a <a href="http://spark.apache.org/docs/1.1.0/sql-programming-guide.html#programmatically-specifying-the-schema">public types API to allow users to create SchemaRDD’s from custom data sources</a>. Finally, many optimizations have been added to the native Parquet support as well as throughout the engine.</p>
<h3 id="mllib">MLlib</h3>
-<p>MLlib adds several new algorithms and optimizations in this release. 1.1 introduces a <a href="https://issues.apache.org/jira/browse/SPARK-2359">new library of statistical packages</a> which provides exploratory analytic functions. These include stratified sampling, correlations, chi-squared tests and support for creating random datasets. This release adds utilities for feature extraction (<a href="https://issues.apache.org/jira/browse/SPARK-2510">Word2Vec</a> and <a href="https://issues.apache.org/jira/browse/SPARK-2511">TF-IDF</a>) and feature transformation (<a href="https://issues.apache.org/jira/browse/SPARK-2272">normalization and standard scaling</a>). Also new are support for <a href="https://issues.apache.org/jira/browse/SPARK-1553">nonnegative matrix factorization</a> and <a href="https://issues.apache.org/jira/browse/SPARK-1782">SVD via Lanczos</a>. The decision tree algorithm has been <a href="https://issues.apache.org/jira/browse/SPARK-2478">added in Python and Java</a>. A tree aggregation primitive has been added to help optimize many existing algorithms. Performance improves across the board in MLlib 1.1, with improvements of around 2-3X for many algorithms and up to 5X for large scale decision tree problems. </p>
+<p>MLlib adds several new algorithms and optimizations in this release. 1.1 introduces a <a href="https://issues.apache.org/jira/browse/SPARK-2359">new library of statistical packages</a> which provides exploratory analytic functions. These include stratified sampling, correlations, chi-squared tests and support for creating random datasets. This release adds utilities for feature extraction (<a href="https://issues.apache.org/jira/browse/SPARK-2510">Word2Vec</a> and <a href="https://issues.apache.org/jira/browse/SPARK-2511">TF-IDF</a>) and feature transformation (<a href="https://issues.apache.org/jira/browse/SPARK-2272">normalization and standard scaling</a>). Also new are support for <a href="https://issues.apache.org/jira/browse/SPARK-1553">nonnegative matrix factorization</a> and <a href="https://issues.apache.org/jira/browse/SPARK-1782">SVD via Lanczos</a>. The decision tree algorithm has been <a href="https://issues.apache.org/jira/browse/SPARK-2478">added in Python and Java</a>. A tree aggregation primitive has been added to help optimize many existing algorithms. Performance improves across the board in MLlib 1.1, with improvements of around 2-3X for many algorithms and up to 5X for large scale decision tree problems.</p>
<h3 id="graphx-and-spark-streaming">GraphX and Spark Streaming</h3>
<p>Spark streaming adds a new data source <a href="https://issues.apache.org/jira/browse/SPARK-1981">Amazon Kinesis</a>. For the Apache Flume, a new mode is supported which <a href="https://issues.apache.org/jira/browse/SPARK-1729">pulls data from Flume</a>, simplifying deployment and providing high availability. The first of a set of <a href="https://issues.apache.org/jira/browse/SPARK-2438">streaming machine learning algorithms</a> is introduced with streaming linear regression. Finally, <a href="https://issues.apache.org/jira/browse/SPARK-1341">rate limiting</a> has been added for streaming inputs. GraphX adds <a href="https://issues.apache.org/jira/browse/SPARK-1991">custom storage levels for vertices and edges</a> along with <a href="https://issues.apache.org/jira/browse/SPARK-2748">improved numerical precision</a> across the board. Finally, GraphX adds a new label propagation algorithm.</p>
@@ -215,7 +215,7 @@
<ul>
<li>The default value of <code>spark.io.compression.codec</code> is now <code>snappy</code> for improved memory usage. Old behavior can be restored by switching to <code>lzf</code>.</li>
- <li>The default value of <code>spark.broadcast.factory</code> is now <code>org.apache.spark.broadcast.TorrentBroadcastFactory</code> for improved efficiency of broadcasts. Old behavior can be restored by switching to <code>org.apache.spark.broadcast.HttpBroadcastFactory</code>. </li>
+ <li>The default value of <code>spark.broadcast.factory</code> is now <code>org.apache.spark.broadcast.TorrentBroadcastFactory</code> for improved efficiency of broadcasts. Old behavior can be restored by switching to <code>org.apache.spark.broadcast.HttpBroadcastFactory</code>.</li>
<li>PySpark now performs external spilling during aggregations. Old behavior can be restored by setting <code>spark.shuffle.spill</code> to <code>false</code>.</li>
<li>PySpark uses a new heuristic for determining the parallelism of shuffle operations. Old behavior can be restored by setting <code>spark.default.parallelism</code> to the number of cores in the cluster.</li>
</ul>
@@ -275,7 +275,7 @@
<li>Daneil Darabos &#8211; bug fixes and UI enhancements</li>
<li>Daoyuan Wang &#8211; SQL fixes</li>
<li>David Lemieux &#8211; bug fix</li>
- <li>Davies Liu &#8211; PySpark fixes and spilling </li>
+ <li>Davies Liu &#8211; PySpark fixes and spilling</li>
<li>DB Tsai &#8211; online summaries in MLlib and other MLlib features</li>
<li>Derek Ma &#8211; bug fix</li>
<li>Doris Xin &#8211; MLlib stats library and several fixes</li>
diff --git a/site/releases/spark-release-1-2-0.html b/site/releases/spark-release-1-2-0.html
index 344be74b6..338ed75a6 100644
--- a/site/releases/spark-release-1-2-0.html
+++ b/site/releases/spark-release-1-2-0.html
@@ -194,7 +194,7 @@
<p>In 1.2 Spark core upgrades two major subsystems to improve the performance and stability of very large scale shuffles. The first is Spark’s communication manager used during bulk transfers, which upgrades to a <a href="https://issues.apache.org/jira/browse/SPARK-2468">netty-based implementation</a>. The second is Spark’s shuffle mechanism, which upgrades to the <a href="https://issues.apache.org/jira/browse/SPARK-3280">“sort based” shuffle initially released in Spark 1.1</a>. These both improve the performance and stability of very large scale shuffles. Spark also adds an <a href="https://issues.apache.org/jira/browse/SPARK-3174">elastic scaling mechanism</a> designed to improve cluster utilization during long running ETL-style jobs. This is currently supported on YARN and will make its way to other cluster managers in future versions. Finally, Spark 1.2 adds support for Scala 2.11. For instructions on building for Scala 2.11 see the <a href="/docs/1.2.0/building-spark.html#building-for-scala-211">build documentation</a>.</p>
<h3 id="spark-streaming">Spark Streaming</h3>
-<p>This release includes two major feature additions to Spark’s streaming library, a Python API and a write ahead log for full driver H/A. The <a href="https://issues.apache.org/jira/browse/SPARK-2377">Python API</a> covers almost all the DStream transformations and output operations. Input sources based on text files and text over sockets are currently supported. Support for Kafka and Flume input streams in Python will be added in the next release. Second, Spark streaming now features H/A driver support through a <a href="https://issues.apache.org/jira/browse/SPARK-3129">write ahead log (WAL)</a>. In Spark 1.1 and earlier, some buffered (received but not yet processed) data can be lost during driver restarts. To prevent this Spark 1.2 adds an optional WAL, which buffers received data into a fault-tolerant file system (e.g. HDFS). See the <a href="/docs/1.2.0/streaming-programming-guide.html">streaming programming guide</a> for more details. </p>
+<p>This release includes two major feature additions to Spark’s streaming library, a Python API and a write ahead log for full driver H/A. The <a href="https://issues.apache.org/jira/browse/SPARK-2377">Python API</a> covers almost all the DStream transformations and output operations. Input sources based on text files and text over sockets are currently supported. Support for Kafka and Flume input streams in Python will be added in the next release. Second, Spark streaming now features H/A driver support through a <a href="https://issues.apache.org/jira/browse/SPARK-3129">write ahead log (WAL)</a>. In Spark 1.1 and earlier, some buffered (received but not yet processed) data can be lost during driver restarts. To prevent this Spark 1.2 adds an optional WAL, which buffers received data into a fault-tolerant file system (e.g. HDFS). See the <a href="/docs/1.2.0/streaming-programming-guide.html">streaming programming guide</a> for more details.</p>
<h3 id="mllib">MLLib</h3>
<p>Spark 1.2 previews a new set of machine learning API’s in a package called spark.ml that <a href="https://issues.apache.org/jira/browse/SPARK-3530">supports learning pipelines</a>, where multiple algorithms are run in sequence with varying parameters. This type of pipeline is common in practical machine learning deployments. The new ML package uses Spark’s SchemaRDD to represent <a href="https://issues.apache.org/jira/browse/SPARK-3573">ML datasets</a>, providing direct interoperability with Spark SQL. In addition to the new API, Spark 1.2 extends decision trees with two tree ensemble methods: <a href="https://issues.apache.org/jira/browse/SPARK-1545">random forests</a> and <a href="https://issues.apache.org/jira/browse/SPARK-1547">gradient-boosted trees</a>, among the most successful tree-based models for classification and regression. Finally, MLlib&#8217;s Python implementation receives a major update in 1.2 to simplify the process of adding Python APIs, along with better Python API coverage.</p>
diff --git a/site/releases/spark-release-1-3-0.html b/site/releases/spark-release-1-3-0.html
index f37a59d3c..ada8d5966 100644
--- a/site/releases/spark-release-1-3-0.html
+++ b/site/releases/spark-release-1-3-0.html
@@ -191,7 +191,7 @@
<p>To download Spark 1.3 visit the <a href="/downloads.html">downloads</a> page.</p>
<h3 id="spark-core">Spark Core</h3>
-<p>Spark 1.3 sees a handful of usability improvements in the core engine. The core API now supports <a href="https://issues.apache.org/jira/browse/SPARK-5430">multi level aggregation trees</a> to help speed up expensive reduce operations. <a href="https://issues.apache.org/jira/browse/SPARK-5063">Improved error reporting</a> has been added for certain gotcha operations. Spark&#8217;s Jetty dependency is <a href="https://issues.apache.org/jira/browse/SPARK-3996">now shaded</a> to help avoid conflicts with user programs. Spark now supports <a href="https://issues.apache.org/jira/browse/SPARK-3883">SSL encryption</a> for some communication endpoints. Finaly, realtime <a href="https://issues.apache.org/jira/browse/SPARK-3428">GC metrics</a> and <a href="https://issues.apache.org/jira/browse/SPARK-4874">record counts</a> have been added to the UI. </p>
+<p>Spark 1.3 sees a handful of usability improvements in the core engine. The core API now supports <a href="https://issues.apache.org/jira/browse/SPARK-5430">multi level aggregation trees</a> to help speed up expensive reduce operations. <a href="https://issues.apache.org/jira/browse/SPARK-5063">Improved error reporting</a> has been added for certain gotcha operations. Spark&#8217;s Jetty dependency is <a href="https://issues.apache.org/jira/browse/SPARK-3996">now shaded</a> to help avoid conflicts with user programs. Spark now supports <a href="https://issues.apache.org/jira/browse/SPARK-3883">SSL encryption</a> for some communication endpoints. Finaly, realtime <a href="https://issues.apache.org/jira/browse/SPARK-3428">GC metrics</a> and <a href="https://issues.apache.org/jira/browse/SPARK-4874">record counts</a> have been added to the UI.</p>
<h3 id="dataframe-api">DataFrame API</h3>
<p>Spark 1.3 adds a new <a href="/docs/1.3.0/sql-programming-guide.html#dataframes">DataFrames API</a> that provides powerful and convenient operators when working with structured datasets. The DataFrame is an evolution of the base RDD API that includes named fields along with schema information. It’s easy to construct a DataFrame from sources such as Hive tables, JSON data, a JDBC database, or any implementation of Spark’s new data source API. Data frames will become a common interchange format between Spark components and when importing and exporting data to other systems. Data frames are supported in Python, Scala, and Java.</p>
@@ -203,7 +203,7 @@
<p>In this release Spark MLlib introduces several new algorithms: latent Dirichlet allocation (LDA) for <a href="https://issues.apache.org/jira/browse/SPARK-1405">topic modeling</a>, <a href="https://issues.apache.org/jira/browse/SPARK-2309">multinomial logistic regression</a> for multiclass classification, <a href="https://issues.apache.org/jira/browse/SPARK-5012">Gaussian mixture model (GMM)</a> and <a href="https://issues.apache.org/jira/browse/SPARK-4259">power iteration clustering</a> for clustering, <a href="https://issues.apache.org/jira/browse/SPARK-4001">FP-growth</a> for frequent pattern mining, and <a href="https://issues.apache.org/jira/browse/SPARK-4409">block matrix abstraction</a> for distributed linear algebra. Initial support has been added for <a href="https://issues.apache.org/jira/browse/SPARK-4587">model import/export</a> in exchangeable format, which will be expanded in future versions to cover more model types in Java/Python/Scala. The implementations of k-means and ALS receive <a href="https://issues.apache.org/jira/browse/SPARK-3424, https://issues.apache.org/jira/browse/SPARK-3541">updates</a> that lead to significant performance gain. PySpark now supports the <a href="https://issues.apache.org/jira/browse/SPARK-4586">ML pipeline API</a> added in Spark 1.2, and <a href="https://issues.apache.org/jira/browse/SPARK-5094">gradient boosted trees</a> and <a href="https://issues.apache.org/jira/browse/SPARK-5012">Gaussian mixture model</a>. Finally, the ML pipeline API has been ported to support the new DataFrames abstraction.</p>
<h3 id="spark-streaming">Spark Streaming</h3>
-<p>Spark 1.3 introduces a new <a href="https://issues.apache.org/jira/browse/SPARK-4964"><em>direct</em> Kafka API</a> (<a href="http://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html">docs</a>) which enables exactly-once delivery without the use of write ahead logs. It also adds a <a href="https://issues.apache.org/jira/browse/SPARK-5047">Python Kafka API</a> along with infrastructure for additional Python API’s in future releases. An online version of <a href="https://issues.apache.org/jira/browse/SPARK-4979">logistic regression</a> and the ability to read <a href="https://issues.apache.org/jira/browse/SPARK-4969">binary records</a> have also been added. For stateful operations, support has been added for loading of an <a href="https://issues.apache.org/jira/browse/SPARK-3660">initial state RDD</a>. Finally, the streaming programming guide has been updated to include information about SQL and DataFrame operations within streaming applications, and important clarifications to the fault-tolerance semantics. </p>
+<p>Spark 1.3 introduces a new <a href="https://issues.apache.org/jira/browse/SPARK-4964"><em>direct</em> Kafka API</a> (<a href="http://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html">docs</a>) which enables exactly-once delivery without the use of write ahead logs. It also adds a <a href="https://issues.apache.org/jira/browse/SPARK-5047">Python Kafka API</a> along with infrastructure for additional Python API’s in future releases. An online version of <a href="https://issues.apache.org/jira/browse/SPARK-4979">logistic regression</a> and the ability to read <a href="https://issues.apache.org/jira/browse/SPARK-4969">binary records</a> have also been added. For stateful operations, support has been added for loading of an <a href="https://issues.apache.org/jira/browse/SPARK-3660">initial state RDD</a>. Finally, the streaming programming guide has been updated to include information about SQL and DataFrame operations within streaming applications, and important clarifications to the fault-tolerance semantics.</p>
<h3 id="graphx">GraphX</h3>
<p>GraphX adds a handful of utility functions in this release, including conversion into a <a href="https://issues.apache.org/jira/browse/SPARK-4917">canonical edge graph</a>.</p>
@@ -219,7 +219,7 @@
<ul>
<li><a href="https://issues.apache.org/jira/browse/SPARK-6194">SPARK-6194</a>: A memory leak in PySPark&#8217;s <code>collect()</code>.</li>
<li><a href="https://issues.apache.org/jira/browse/SPARK-6222">SPARK-6222</a>: An issue with failure recovery in Spark Streaming.</li>
- <li><a href="https://issues.apache.org/jira/browse/SPARK-6315">SPARK-6315</a>: Spark SQL can&#8217;t read parquet data generated with Spark 1.1. </li>
+ <li><a href="https://issues.apache.org/jira/browse/SPARK-6315">SPARK-6315</a>: Spark SQL can&#8217;t read parquet data generated with Spark 1.1.</li>
<li><a href="https://issues.apache.org/jira/browse/SPARK-6247">SPARK-6247</a>: Errors analyzing certain join types in Spark SQL.</li>
</ul>
diff --git a/site/releases/spark-release-1-3-1.html b/site/releases/spark-release-1-3-1.html
index 5c444b632..e0b6dc73c 100644
--- a/site/releases/spark-release-1-3-1.html
+++ b/site/releases/spark-release-1-3-1.html
@@ -196,10 +196,10 @@
<h4 id="spark-sql">Spark SQL</h4>
<ul>
<li>Unable to use reserved words in DDL (<a href="http://issues.apache.org/jira/browse/SPARK-6250">SPARK-6250</a>)</li>
- <li>Parquet no longer caches metadata (<a href="http://issues.apache.org/jira/browse/SPARK-6575">SPARK-6575</a>) </li>
+ <li>Parquet no longer caches metadata (<a href="http://issues.apache.org/jira/browse/SPARK-6575">SPARK-6575</a>)</li>
<li>Bug when joining two Parquet tables (<a href="http://issues.apache.org/jira/browse/SPARK-6851">SPARK-6851</a>)</li>
- <li>Unable to read parquet data generated by Spark 1.1.1 (<a href="http://issues.apache.org/jira/browse/SPARK-6315">SPARK-6315</a>) </li>
- <li>Parquet data source may use wrong Hadoop FileSystem (<a href="http://issues.apache.org/jira/browse/SPARK-6330">SPARK-6330</a>) </li>
+ <li>Unable to read parquet data generated by Spark 1.1.1 (<a href="http://issues.apache.org/jira/browse/SPARK-6315">SPARK-6315</a>)</li>
+ <li>Parquet data source may use wrong Hadoop FileSystem (<a href="http://issues.apache.org/jira/browse/SPARK-6330">SPARK-6330</a>)</li>
</ul>
<h4 id="spark-streaming">Spark Streaming</h4>
diff --git a/site/releases/spark-release-1-4-0.html b/site/releases/spark-release-1-4-0.html
index e6e1f0286..67f734c1c 100644
--- a/site/releases/spark-release-1-4-0.html
+++ b/site/releases/spark-release-1-4-0.html
@@ -250,7 +250,7 @@ Python coverage. MLlib also adds several new algorithms.</p>
</ul>
<h3 id="spark-streaming">Spark Streaming</h3>
-<p>Spark streaming adds visual instrumentation graphs and significantly improved debugging information in the UI. It also enhances support for both Kafka and Kinesis. </p>
+<p>Spark streaming adds visual instrumentation graphs and significantly improved debugging information in the UI. It also enhances support for both Kafka and Kinesis.</p>
<ul>
<li><a href="https://issues.apache.org/jira/browse/SPARK-7602">SPARK-7602</a>: Visualization and monitoring in the streaming UI including batch drill down (<a href="https://issues.apache.org/jira/browse/SPARK-6796">SPARK-6796</a>, <a href="https://issues.apache.org/jira/browse/SPARK-6862">SPARK-6862</a>)</li>
@@ -276,7 +276,7 @@ Python coverage. MLlib also adds several new algorithms.</p>
<h4 id="test-partners">Test Partners</h4>
-<p>Thanks to The following organizations, who helped benchmark or integration test release candidates: <br /> Intel, Palantir, Cloudera, Mesosphere, Huawei, Shopify, Netflix, Yahoo, UC Berkeley and Databricks. </p>
+<p>Thanks to The following organizations, who helped benchmark or integration test release candidates: <br /> Intel, Palantir, Cloudera, Mesosphere, Huawei, Shopify, Netflix, Yahoo, UC Berkeley and Databricks.</p>
<h4 id="contributors">Contributors</h4>
<ul>
diff --git a/site/releases/spark-release-1-5-0.html b/site/releases/spark-release-1-5-0.html
index 9cd7351c4..1d4b70e72 100644
--- a/site/releases/spark-release-1-5-0.html
+++ b/site/releases/spark-release-1-5-0.html
@@ -191,25 +191,25 @@
<p>You can consult JIRA for the <a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420&amp;version=12332078">detailed changes</a>. We have curated a list of high level changes here:</p>
<ul id="markdown-toc">
- <li><a href="#apis-rdd-dataframe-and-sql">APIs: RDD, DataFrame and SQL</a></li>
- <li><a href="#backend-execution-dataframe-and-sql">Backend Execution: DataFrame and SQL</a></li>
- <li><a href="#integrations-data-sources-hive-hadoop-mesos-and-cluster-management">Integrations: Data Sources, Hive, Hadoop, Mesos and Cluster Management</a></li>
- <li><a href="#r-language">R Language</a></li>
- <li><a href="#machine-learning-and-advanced-analytics">Machine Learning and Advanced Analytics</a></li>
- <li><a href="#spark-streaming">Spark Streaming</a></li>
- <li><a href="#deprecations-removals-configs-and-behavior-changes">Deprecations, Removals, Configs, and Behavior Changes</a> <ul>
- <li><a href="#spark-core">Spark Core</a></li>
- <li><a href="#spark-sql--dataframes">Spark SQL &amp; DataFrames</a></li>
- <li><a href="#spark-streaming-1">Spark Streaming</a></li>
- <li><a href="#mllib">MLlib</a></li>
+ <li><a href="#apis-rdd-dataframe-and-sql" id="markdown-toc-apis-rdd-dataframe-and-sql">APIs: RDD, DataFrame and SQL</a></li>
+ <li><a href="#backend-execution-dataframe-and-sql" id="markdown-toc-backend-execution-dataframe-and-sql">Backend Execution: DataFrame and SQL</a></li>
+ <li><a href="#integrations-data-sources-hive-hadoop-mesos-and-cluster-management" id="markdown-toc-integrations-data-sources-hive-hadoop-mesos-and-cluster-management">Integrations: Data Sources, Hive, Hadoop, Mesos and Cluster Management</a></li>
+ <li><a href="#r-language" id="markdown-toc-r-language">R Language</a></li>
+ <li><a href="#machine-learning-and-advanced-analytics" id="markdown-toc-machine-learning-and-advanced-analytics">Machine Learning and Advanced Analytics</a></li>
+ <li><a href="#spark-streaming" id="markdown-toc-spark-streaming">Spark Streaming</a></li>
+ <li><a href="#deprecations-removals-configs-and-behavior-changes" id="markdown-toc-deprecations-removals-configs-and-behavior-changes">Deprecations, Removals, Configs, and Behavior Changes</a> <ul>
+ <li><a href="#spark-core" id="markdown-toc-spark-core">Spark Core</a></li>
+ <li><a href="#spark-sql--dataframes" id="markdown-toc-spark-sql--dataframes">Spark SQL &amp; DataFrames</a></li>
+ <li><a href="#spark-streaming-1" id="markdown-toc-spark-streaming-1">Spark Streaming</a></li>
+ <li><a href="#mllib" id="markdown-toc-mllib">MLlib</a></li>
</ul>
</li>
- <li><a href="#known-issues">Known Issues</a> <ul>
- <li><a href="#sqldataframe">SQL/DataFrame</a></li>
- <li><a href="#streaming">Streaming</a></li>
+ <li><a href="#known-issues" id="markdown-toc-known-issues">Known Issues</a> <ul>
+ <li><a href="#sqldataframe" id="markdown-toc-sqldataframe">SQL/DataFrame</a></li>
+ <li><a href="#streaming" id="markdown-toc-streaming">Streaming</a></li>
</ul>
</li>
- <li><a href="#credits">Credits</a></li>
+ <li><a href="#credits" id="markdown-toc-credits">Credits</a></li>
</ul>
<h3 id="apis-rdd-dataframe-and-sql">APIs: RDD, DataFrame and SQL</h3>
diff --git a/site/releases/spark-release-1-6-0.html b/site/releases/spark-release-1-6-0.html
index 64b56c374..5fad2842f 100644
--- a/site/releases/spark-release-1-6-0.html
+++ b/site/releases/spark-release-1-6-0.html
@@ -191,13 +191,13 @@
<p>You can consult JIRA for the <a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12333083&amp;projectId=12315420">detailed changes</a>. We have curated a list of high level changes here:</p>
<ul id="markdown-toc">
- <li><a href="#spark-coresql">Spark Core/SQL</a></li>
- <li><a href="#spark-streaming">Spark Streaming</a></li>
- <li><a href="#mllib">MLlib</a></li>
- <li><a href="#deprecations">Deprecations</a></li>
- <li><a href="#changes-of-behavior">Changes of behavior</a></li>
- <li><a href="#known-issues">Known issues</a></li>
- <li><a href="#credits">Credits</a></li>
+ <li><a href="#spark-coresql" id="markdown-toc-spark-coresql">Spark Core/SQL</a></li>
+ <li><a href="#spark-streaming" id="markdown-toc-spark-streaming">Spark Streaming</a></li>
+ <li><a href="#mllib" id="markdown-toc-mllib">MLlib</a></li>
+ <li><a href="#deprecations" id="markdown-toc-deprecations">Deprecations</a></li>
+ <li><a href="#changes-of-behavior" id="markdown-toc-changes-of-behavior">Changes of behavior</a></li>
+ <li><a href="#known-issues" id="markdown-toc-known-issues">Known issues</a></li>
+ <li><a href="#credits" id="markdown-toc-credits">Credits</a></li>
</ul>
<h3 id="spark-coresql">Spark Core/SQL</h3>
@@ -220,7 +220,7 @@
<ul>
<li><a href="https://issues.apache.org/jira/browse/SPARK-10000">SPARK-10000</a> <strong>Unified Memory Management</strong> - Shared memory for execution and caching instead of exclusive division of the regions.</li>
<li><a href="https://issues.apache.org/jira/browse/SPARK-11787">SPARK-11787</a> <strong>Parquet Performance</strong> - Improve Parquet scan performance when using flat schemas.</li>
- <li><a href="https://issues.apache.org/jira/browse/SPARK-9241">SPARK-9241&#160;</a> <strong>Improved query planner for queries having distinct aggregations</strong> - Query plans of distinct aggregations are more robust when distinct columns have high cardinality. </li>
+ <li><a href="https://issues.apache.org/jira/browse/SPARK-9241">SPARK-9241&#160;</a> <strong>Improved query planner for queries having distinct aggregations</strong> - Query plans of distinct aggregations are more robust when distinct columns have high cardinality.</li>
<li><a href="https://issues.apache.org/jira/browse/SPARK-9858">SPARK-9858&#160;</a> <strong>Adaptive query execution</strong> - Initial support for automatically selecting the number of reducers for joins and aggregations.</li>
<li><a href="https://issues.apache.org/jira/browse/SPARK-10978">SPARK-10978</a> <strong>Avoiding double filters in Data Source API</strong> - When implementing a data source with filter pushdown, developers can now tell Spark SQL to avoid double evaluating a pushed-down filter.</li>
<li><a href="https://issues.apache.org/jira/browse/SPARK-11111">SPARK-11111</a> <strong>Fast null-safe joins</strong> - Joins using null-safe equality (<code>&lt;=&gt;</code>) will now execute using SortMergeJoin instead of computing a cartisian product.</li>
@@ -233,7 +233,7 @@
<h3 id="spark-streaming">Spark Streaming</h3>
<ul>
- <li><strong>API Updates</strong>
+ <li><strong>API Updates</strong>
<ul>
<li><a href="https://issues.apache.org/jira/browse/SPARK-2629">SPARK-2629&#160;</a> <strong>New improved state management</strong> - <code>mapWithState</code> - a DStream transformation for stateful stream processing, supercedes <code>updateStateByKey</code> in functionality and performance.</li>
<li><a href="https://issues.apache.org/jira/browse/SPARK-11198">SPARK-11198</a> <strong>Kinesis record deaggregation</strong> - Kinesis streams have been upgraded to use KCL 1.4.0 and supports transparent deaggregation of KPL-aggregated records.</li>
@@ -244,7 +244,7 @@
<li><strong>UI Improvements</strong>
<ul>
<li>Made failures visible in the streaming tab, in the timelines, batch list, and batch details page.</li>
- <li>Made output operations visible in the streaming tab as progress bars. </li>
+ <li>Made output operations visible in the streaming tab as progress bars.</li>
</ul>
</li>
</ul>
diff --git a/site/releases/spark-release-2-0-0.html b/site/releases/spark-release-2-0-0.html
index 7aa2fc38b..3d373a288 100644
--- a/site/releases/spark-release-2-0-0.html
+++ b/site/releases/spark-release-2-0-0.html
@@ -191,30 +191,30 @@
<p>To download Apache Spark 2.0.0, visit the <a href="http://spark.apache.org/downloads.html">downloads</a> page. You can consult JIRA for the <a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420&amp;version=12329449">detailed changes</a>. We have curated a list of high level changes here, grouped by major modules.</p>
<ul id="markdown-toc">
- <li><a href="#api-stability">API Stability</a></li>
- <li><a href="#core-and-spark-sql">Core and Spark SQL</a> <ul>
- <li><a href="#programming-apis">Programming APIs</a></li>
- <li><a href="#sql">SQL</a></li>
- <li><a href="#new-features">New Features</a></li>
- <li><a href="#performance-and-runtime">Performance and Runtime</a></li>
+ <li><a href="#api-stability" id="markdown-toc-api-stability">API Stability</a></li>
+ <li><a href="#core-and-spark-sql" id="markdown-toc-core-and-spark-sql">Core and Spark SQL</a> <ul>
+ <li><a href="#programming-apis" id="markdown-toc-programming-apis">Programming APIs</a></li>
+ <li><a href="#sql" id="markdown-toc-sql">SQL</a></li>
+ <li><a href="#new-features" id="markdown-toc-new-features">New Features</a></li>
+ <li><a href="#performance-and-runtime" id="markdown-toc-performance-and-runtime">Performance and Runtime</a></li>
</ul>
</li>
- <li><a href="#mllib">MLlib</a> <ul>
- <li><a href="#new-features-1">New features</a></li>
- <li><a href="#speedscaling">Speed/scaling</a></li>
+ <li><a href="#mllib" id="markdown-toc-mllib">MLlib</a> <ul>
+ <li><a href="#new-features-1" id="markdown-toc-new-features-1">New features</a></li>
+ <li><a href="#speedscaling" id="markdown-toc-speedscaling">Speed/scaling</a></li>
</ul>
</li>
- <li><a href="#sparkr">SparkR</a></li>
- <li><a href="#streaming">Streaming</a></li>
- <li><a href="#dependency-packaging-and-operations">Dependency, Packaging, and Operations</a></li>
- <li><a href="#removals-behavior-changes-and-deprecations">Removals, Behavior Changes and Deprecations</a> <ul>
- <li><a href="#removals">Removals</a></li>
- <li><a href="#behavior-changes">Behavior Changes</a></li>
- <li><a href="#deprecations">Deprecations</a></li>
+ <li><a href="#sparkr" id="markdown-toc-sparkr">SparkR</a></li>
+ <li><a href="#streaming" id="markdown-toc-streaming">Streaming</a></li>
+ <li><a href="#dependency-packaging-and-operations" id="markdown-toc-dependency-packaging-and-operations">Dependency, Packaging, and Operations</a></li>
+ <li><a href="#removals-behavior-changes-and-deprecations" id="markdown-toc-removals-behavior-changes-and-deprecations">Removals, Behavior Changes and Deprecations</a> <ul>
+ <li><a href="#removals" id="markdown-toc-removals">Removals</a></li>
+ <li><a href="#behavior-changes" id="markdown-toc-behavior-changes">Behavior Changes</a></li>
+ <li><a href="#deprecations" id="markdown-toc-deprecations">Deprecations</a></li>
</ul>
</li>
- <li><a href="#known-issues">Known Issues</a></li>
- <li><a href="#credits">Credits</a></li>
+ <li><a href="#known-issues" id="markdown-toc-known-issues">Known Issues</a></li>
+ <li><a href="#credits" id="markdown-toc-credits">Credits</a></li>
</ul>
<h3 id="api-stability">API Stability</h3>