summaryrefslogblamecommitdiff
path: root/site/releases/spark-release-0-3.html
blob: 4b01191b4189a1864a2d02ebc2803ceecd2a006d (plain) (tree)
1
2
3
4
5
6
7
8
9
10
11
               
                
      



                                                                        



                                     
 

  

  


                                                      
 


                                                          









                                                                                                                     
 
                                                              


                                                      
                  
 



                                         
           
 




                                                                                     

       

      




                                                                                     





                                                  
                                                 
















                                                                                                              
        
 





                                                                       
                                         

                                  
                                                         

                                                                 
                                                        
                                   
                                                                                                                            






                                                                   
                                                                           
                                                                                       

             
                                                    
                           
                                                                                 






                                                                                                              

                                                                                                                



                                          












                                                                                              









                                                   
        


                                                                                
                                                                                
                                                        
        


                                                                                

                                                                                                                              
        







                                                                                                      
                           
          
                            
                                                       

                                                               
                                                      
           
                                                                                                               

          
 

                                      

















                                                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                              



































                                                                                                                                                                                                                                     





                                       
        
      
 

 

                      
                                                                                                  
                                                                     



         

       
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="utf-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">

  <title>
     Spark Release 0.3 | Apache Spark
    
  </title>

  

  

  <!-- Bootstrap core CSS -->
  <link href="/css/cerulean.min.css" rel="stylesheet">
  <link href="/css/custom.css" rel="stylesheet">

  <!-- Code highlighter CSS -->
  <link href="/css/pygments-default.css" rel="stylesheet">

  <script type="text/javascript">
  <!-- Google Analytics initialization -->
  var _gaq = _gaq || [];
  _gaq.push(['_setAccount', 'UA-32518208-2']);
  _gaq.push(['_trackPageview']);
  (function() {
    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
  })();

  <!-- Adds slight delay to links to allow async reporting -->
  function trackOutboundLink(link, category, action) {
    try {
      _gaq.push(['_trackEvent', category , action]);
    } catch(err){}

    setTimeout(function() {
      document.location.href = link.href;
    }, 100);
  }
  </script>

  <!-- HTML5 shim and Respond.js IE8 support of HTML5 elements and media queries -->
  <!--[if lt IE 9]>
  <script src="https://oss.maxcdn.com/libs/html5shiv/3.7.0/html5shiv.js"></script>
  <script src="https://oss.maxcdn.com/libs/respond.js/1.3.0/respond.min.js"></script>
  <![endif]-->
</head>

<body>

<script src="https://code.jquery.com/jquery.js"></script>
<script src="//netdna.bootstrapcdn.com/bootstrap/3.0.3/js/bootstrap.min.js"></script>
<script src="/js/lang-tabs.js"></script>
<script src="/js/downloads.js"></script>

<div class="container" style="max-width: 1200px;">

<div class="masthead">
  
    <p class="lead">
      <a href="/">
      <img src="/images/spark-logo-trademark.png"
        style="height:100px; width:auto; vertical-align: bottom; margin-top: 20px;"></a><span class="tagline">
          Lightning-fast cluster computing
      </span>
    </p>
  
</div>

<nav class="navbar navbar-default" role="navigation">
  <!-- Brand and toggle get grouped for better mobile display -->
  <div class="navbar-header">
    <button type="button" class="navbar-toggle" data-toggle="collapse"
            data-target="#navbar-collapse-1">
      <span class="sr-only">Toggle navigation</span>
      <span class="icon-bar"></span>
      <span class="icon-bar"></span>
      <span class="icon-bar"></span>
    </button>
  </div>

  <!-- Collect the nav links, forms, and other content for toggling -->
  <div class="collapse navbar-collapse" id="navbar-collapse-1">
    <ul class="nav navbar-nav">
      <li><a href="/downloads.html">Download</a></li>
      <li class="dropdown">
        <a href="#" class="dropdown-toggle" data-toggle="dropdown">
          Libraries <b class="caret"></b>
        </a>
        <ul class="dropdown-menu">
          <li><a href="/sql/">SQL and DataFrames</a></li>
          <li><a href="/streaming/">Spark Streaming</a></li>
          <li><a href="/mllib/">MLlib (machine learning)</a></li>
          <li><a href="/graphx/">GraphX (graph)</a></li>
          <li class="divider"></li>
          <li><a href="https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects">Third-Party Packages</a></li>
        </ul>
      </li>
      <li class="dropdown">
        <a href="#" class="dropdown-toggle" data-toggle="dropdown">
          Documentation <b class="caret"></b>
        </a>
        <ul class="dropdown-menu">
          <li><a href="/docs/latest/">Latest Release (Spark 2.0.0)</a></li>
          <li><a href="/documentation.html">Older Versions and Other Resources</a></li>
        </ul>
      </li>
      <li><a href="/examples.html">Examples</a></li>
      <li class="dropdown">
        <a href="/community.html" class="dropdown-toggle" data-toggle="dropdown">
          Community <b class="caret"></b>
        </a>
        <ul class="dropdown-menu">
          <li><a href="/community.html">Mailing Lists</a></li>
          <li><a href="/community.html#events">Events and Meetups</a></li>
          <li><a href="/community.html#history">Project History</a></li>
          <li><a href="https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark">Powered By</a></li>
          <li><a href="https://cwiki.apache.org/confluence/display/SPARK/Committers">Project Committers</a></li>
          <li><a href="https://issues.apache.org/jira/browse/SPARK">Issue Tracker</a></li>
        </ul>
      </li>
      <li><a href="/faq.html">FAQ</a></li>
    </ul>
    <ul class="nav navbar-nav navbar-right">
      <li class="dropdown">
        <a href="http://www.apache.org/" class="dropdown-toggle" data-toggle="dropdown">
          Apache Software Foundation <b class="caret"></b></a>
        <ul class="dropdown-menu">
          <li><a href="http://www.apache.org/">Apache Homepage</a></li>
          <li><a href="http://www.apache.org/licenses/">License</a></li>
          <li><a href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li>
          <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
          <li><a href="http://www.apache.org/security/">Security</a></li>
        </ul>
      </li>
    </ul>
  </div>
  <!-- /.navbar-collapse -->
</nav>


<div class="row">
  <div class="col-md-3 col-md-push-9">
    <div class="news" style="margin-bottom: 20px;">
      <h5>Latest News</h5>
      <ul class="list-unstyled">
        
          <li><a href="/news/spark-2-0-1-released.html">Spark 2.0.1 released</a>
          <span class="small">(Oct 03, 2016)</span></li>
        
          <li><a href="/news/spark-2-0-0-released.html">Spark 2.0.0 released</a>
          <span class="small">(Jul 26, 2016)</span></li>
        
          <li><a href="/news/spark-1-6-2-released.html">Spark 1.6.2 released</a>
          <span class="small">(Jun 25, 2016)</span></li>
        
          <li><a href="/news/submit-talks-to-spark-summit-eu-2016.html">Call for Presentations for Spark Summit EU is Open</a>
          <span class="small">(Jun 16, 2016)</span></li>
        
      </ul>
      <p class="small" style="text-align: right;"><a href="/news/index.html">Archive</a></p>
    </div>
    <div class="hidden-xs hidden-sm">
      <a href="/downloads.html" class="btn btn-success btn-lg btn-block" style="margin-bottom: 30px;">
        Download Spark
      </a>
      <p style="font-size: 16px; font-weight: 500; color: #555;">
        Built-in Libraries:
      </p>
      <ul class="list-none">
        <li><a href="/sql/">SQL and DataFrames</a></li>
        <li><a href="/streaming/">Spark Streaming</a></li>
        <li><a href="/mllib/">MLlib (machine learning)</a></li>
        <li><a href="/graphx/">GraphX (graph)</a></li>
      </ul>
      <a href="https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects">Third-Party Packages</a>
    </div>
  </div>

  <div class="col-md-9 col-md-pull-3">
    <h2>Spark Release 0.3</h2>


<p>Spark 0.3 brings a variety of new features. You can download it for either <a href="https://github.com/mesos/spark/tarball/0.3-scala-2.9">Scala 2.9</a> or <a href="https://github.com/mesos/spark/tarball/0.3-scala-2.8">Scala 2.8</a>.</p>

<h3>Scala 2.9 Support</h3>

<p>This is the first release to support Scala 2.9 in addition to 2.8. Future releases are likely to be 2.9-only unless there is high demand for 2.8.</p>

<h3>Save Operations</h3>

<p>You can now save distributed datasets to the Hadoop filesystem (HDFS), Amazon S3, Hypertable, and any other storage system supported by Hadoop. There are convenience methods for several common formats, like text files and SequenceFiles. For example, to save a dataset as text:</p>

<div class="code">
<span class="keyword">val</span> numbers = spark.parallelize(1 to 100)<br /> numbers.<span class="sparkop">saveAsTextFile</span>(<span class="string">"hdfs://..."</span>)
</div>

<h3>Native Types for SequenceFiles</h3>

<p>In working with SequenceFiles, which store objects that implement Hadoop&#8217;s Writable interface, Spark will now let you use native types for certain common Writable types, like IntWritable and Text. For example:</p>

<div class="code">
<span class="comment">// Will read a SequenceFile of (IntWritable, Text)</span><br />
<span class="keyword">val</span> data = spark.sequenceFile[Int, String](<span class="string">"hdfs://..."</span>)
</div>

<p>Similarly, you can save datasets of basic types directly as SequenceFiles:</p>

<div class="code">
<span class="comment">// Will write a SequenceFile of (IntWritable, IntWritable)</span><br />
<span class="keyword">val</span> squares = spark.parallelize(1 to 100).<span class="sparkop">map</span>(<span class="closure">n =&gt; (n, n*n)</span>)<br />
squares.saveAsSequenceFile(<span class="string">"hdfs://..."</span>)
</div>

<h3>Maven Integration</h3>

<p>Spark now fetches dependencies via Maven and can publish Maven artifacts for easier dependency management.</p>

<h3>Faster Broadcast &amp; Shuffle</h3>

<p>This release includes broadcast and shuffle algorithms from <a href="http://www.mosharaf.com/wp-content/uploads/orchestra-sigcomm11.pdf">this paper</a> to better support applications that communicate large amounts of data.</p>

<h3>Support for Non-Filesystem Hadoop Input Formats</h3>

<p>The new <tt>SparkContext.hadoopRDD</tt> method allows reading data from Hadoop-compatible storage systems other than file systems, such as HBase, Hypertable, etc.</p>

<h3>Other Features</h3>

<ul>
  <li>Outer join operators (<tt>leftOuterJoin</tt>, <tt>rightOuterJoin</tt>, etc).</li>
  <li>Support for Scala 2.9 interpreter features (history search, Ctrl-C current line, etc) in the 2.9 version.</li>
  <li>Better default levels of parallelism for various operations.</li>
  <li>Ability to control number of splits in a file.</li>
  <li>Various bug fixes.</li>
</ul>


<p>
<br/>
<a href="/news/">Spark News Archive</a>
</p>

  </div>
</div>



<footer class="small">
  <hr>
  Apache Spark, Spark, Apache, and the Spark logo are <a href="/trademarks.html">trademarks</a> of
  <a href="http://www.apache.org">The Apache Software Foundation</a>.
</footer>

</div>

</body>
</html>