summaryrefslogblamecommitdiff
path: root/site/sql/index.html
blob: cc29f32c10c120b77c3f14d0544c3d105551d62b (plain) (tree)
1
2
3
4
5
6
7
8
9







                                                                        
                                              




          



                                                                                                                                                                                



                                                      


                                                          































                                                                                                                     




                                                                                     





                                                  
                                                 

























                                                                                      
                                         

                                  
                                                         


                                                                 
                                   
                                                                               






                                                                   
                                                                           
                                                                                       

             
                                                    
                           
                                                                                 






                                                                                                              

                                                                                                                



                                          












                                                                                              










                                                   


                                                                                


                                                                                


                                                                                                                              


                                                                                       







                                                                                                      
                           
          
                            
                                                       



                                                               
                                                                  




                                      
                                                                             





                                 
                                                         

        
                                                                                                                                                                                                                  





                                                                            
                                                                        
                                                                                                                                               








                                                                                                               
                                
                    
                                              

        
                                                                                                                                                                                   




                                                                            




                                                                                                                                                         









                                                                       
                                                   




















                                                                                              
                                                                                                          






























                                                                                                                                                  

                                   
                                          
       


                                                                                                                                                     


























                                                                                                   
                                                                                                                             






                                                                           
                                                                              



        

        




                      
                                                                                                                          




                                                                     

       
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="utf-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">

  <title>
     Spark SQL &amp; DataFrames | Apache Spark
    
  </title>

  

  
    <meta name="description" content="Spark SQL is Spark's module for working with structured data, either within Spark programs or through standard JDBC and ODBC connectors.">
  

  <!-- Bootstrap core CSS -->
  <link href="/css/cerulean.min.css" rel="stylesheet">
  <link href="/css/custom.css" rel="stylesheet">

  <!-- Code highlighter CSS -->
  <link href="/css/pygments-default.css" rel="stylesheet">

  <script type="text/javascript">
  <!-- Google Analytics initialization -->
  var _gaq = _gaq || [];
  _gaq.push(['_setAccount', 'UA-32518208-2']);
  _gaq.push(['_trackPageview']);
  (function() {
    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
  })();

  <!-- Adds slight delay to links to allow async reporting -->
  function trackOutboundLink(link, category, action) {
    try {
      _gaq.push(['_trackEvent', category , action]);
    } catch(err){}

    setTimeout(function() {
      document.location.href = link.href;
    }, 100);
  }
  </script>

  <!-- HTML5 shim and Respond.js IE8 support of HTML5 elements and media queries -->
  <!--[if lt IE 9]>
  <script src="https://oss.maxcdn.com/libs/html5shiv/3.7.0/html5shiv.js"></script>
  <script src="https://oss.maxcdn.com/libs/respond.js/1.3.0/respond.min.js"></script>
  <![endif]-->
</head>

<body>

<script src="https://code.jquery.com/jquery.js"></script>
<script src="//netdna.bootstrapcdn.com/bootstrap/3.0.3/js/bootstrap.min.js"></script>
<script src="/js/lang-tabs.js"></script>
<script src="/js/downloads.js"></script>

<div class="container" style="max-width: 1200px;">

<div class="masthead">
  
    <p class="lead">
      <a href="/">
      <img src="/images/spark-logo-trademark.png"
      style="height:100px; width:auto; vertical-align: bottom; margin-top: 20px;"></a>
      <a href="#"><span class="subproject">
        SQL
      </span></a>
    </p>
  
</div>

<nav class="navbar navbar-default" role="navigation">
  <!-- Brand and toggle get grouped for better mobile display -->
  <div class="navbar-header">
    <button type="button" class="navbar-toggle" data-toggle="collapse"
            data-target="#navbar-collapse-1">
      <span class="sr-only">Toggle navigation</span>
      <span class="icon-bar"></span>
      <span class="icon-bar"></span>
      <span class="icon-bar"></span>
    </button>
  </div>

  <!-- Collect the nav links, forms, and other content for toggling -->
  <div class="collapse navbar-collapse" id="navbar-collapse-1">
    <ul class="nav navbar-nav">
      <li><a href="/downloads.html">Download</a></li>
      <li class="dropdown">
        <a href="#" class="dropdown-toggle" data-toggle="dropdown">
          Libraries <b class="caret"></b>
        </a>
        <ul class="dropdown-menu">
          <li><a href="/sql/">SQL and DataFrames</a></li>
          <li><a href="/streaming/">Spark Streaming</a></li>
          <li><a href="/mllib/">MLlib (machine learning)</a></li>
          <li><a href="/graphx/">GraphX (graph)</a></li>
          <li class="divider"></li>
          <li><a href="http://spark-packages.org">Third-Party Packages</a></li>
        </ul>
      </li>
      <li class="dropdown">
        <a href="#" class="dropdown-toggle" data-toggle="dropdown">
          Documentation <b class="caret"></b>
        </a>
        <ul class="dropdown-menu">
          <li><a href="/docs/latest/">Latest Release (Spark 2.0.0)</a></li>
          <li><a href="/documentation.html">Older Versions and Other Resources</a></li>
        </ul>
      </li>
      <li><a href="/examples.html">Examples</a></li>
      <li class="dropdown">
        <a href="/community.html" class="dropdown-toggle" data-toggle="dropdown">
          Community <b class="caret"></b>
        </a>
        <ul class="dropdown-menu">
          <li><a href="/community.html">Mailing Lists</a></li>
          <li><a href="/community.html#events">Events and Meetups</a></li>
          <li><a href="/community.html#history">Project History</a></li>
          <li><a href="https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark">Powered By</a></li>
          <li><a href="https://cwiki.apache.org/confluence/display/SPARK/Committers">Project Committers</a></li>
          <li><a href="https://issues.apache.org/jira/browse/SPARK">Issue Tracker</a></li>
        </ul>
      </li>
      <li><a href="/faq.html">FAQ</a></li>
    </ul>
    <ul class="nav navbar-nav navbar-right">
      <li class="dropdown">
        <a href="http://www.apache.org/" class="dropdown-toggle" data-toggle="dropdown">
          Apache Software Foundation <b class="caret"></b></a>
        <ul class="dropdown-menu">
          <li><a href="http://www.apache.org/">Apache Homepage</a></li>
          <li><a href="http://www.apache.org/licenses/">License</a></li>
          <li><a href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li>
          <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
          <li><a href="http://www.apache.org/security/">Security</a></li>
        </ul>
      </li>
    </ul>
  </div>
  <!-- /.navbar-collapse -->
</nav>


<div class="row">
  <div class="col-md-3 col-md-push-9">
    <div class="news" style="margin-bottom: 20px;">
      <h5>Latest News</h5>
      <ul class="list-unstyled">
        
          <li><a href="/news/spark-2-0-0-released.html">Spark 2.0.0 released</a>
          <span class="small">(Jul 27, 2016)</span></li>
        
          <li><a href="/news/spark-1-6-2-released.html">Spark 1.6.2 released</a>
          <span class="small">(Jun 25, 2016)</span></li>
        
          <li><a href="/news/submit-talks-to-spark-summit-eu-2016.html">Call for Presentations for Spark Summit EU is Open</a>
          <span class="small">(Jun 16, 2016)</span></li>
        
          <li><a href="/news/spark-2.0.0-preview.html">Preview release of Spark 2.0</a>
          <span class="small">(May 26, 2016)</span></li>
        
      </ul>
      <p class="small" style="text-align: right;"><a href="/news/index.html">Archive</a></p>
    </div>
    <div class="hidden-xs hidden-sm">
      <a href="/downloads.html" class="btn btn-success btn-lg btn-block" style="margin-bottom: 30px;">
        Download Spark
      </a>
      <p style="font-size: 16px; font-weight: 500; color: #555;">
        Built-in Libraries:
      </p>
      <ul class="list-none">
        <li><a href="/sql/">SQL and DataFrames</a></li>
        <li><a href="/streaming/">Spark Streaming</a></li>
        <li><a href="/mllib/">MLlib (machine learning)</a></li>
        <li><a href="/graphx/">GraphX (graph)</a></li>
      </ul>
      <a href="http://spark-packages.org">Third-Party Packages</a>
    </div>
  </div>

  <div class="col-md-9 col-md-pull-3">
    <div class="jumbotron">
  <b>Spark SQL</b> is Apache Spark's module for working with structured data.
</div>

<div class="row row-padded">
  <div class="col-md-7 col-sm-7">
    <h2>Integrated</h2>
    <p class="lead">
	  Seamlessly mix SQL queries with Spark programs.
    </p>
    <p>
	  Spark SQL lets you query structured data inside Spark programs, using either SQL or a familiar <a href="/docs/latest/sql-programming-guide.html">DataFrame API</a>. Usable in Java, Scala, Python and R.
    </p>
  </div>
  <div class="col-md-5 col-sm-5 col-padded-top col-center">

    <div style="margin-top: 15px; text-align: left; display: inline-block;">
      <div class="code">
	    context = <span class="sparkop">HiveContext</span>(sc)<br />
		results = context.<span class="sparkop">sql</span>(<br />&nbsp;&nbsp;<span class="closure">"SELECT * FROM people"</span>)<br />
		names = results.<span class="sparkop">map</span>(<span class="closure">lambda p: p.name</span>)
	  </div>
      <div class="caption">Apply functions to results of SQL queries.</div>
    </div>
  </div>
</div>

<div class="row row-padded">
  <div class="col-md-7 col-sm-7">
    <h2>Uniform Data Access</h2>
    <p class="lead">
      Connect to any data source the same way.
    </p>
    <p>
      DataFrames and SQL provide a common way to access a variety of data sources, including Hive, Avro, Parquet, ORC, JSON, and JDBC. You can even join data across these sources.
    </p>
  </div>
  <div class="col-md-5 col-sm-5 col-padded-top col-center">
    <div style="margin-top: 15px; text-align: left; display: inline-block;">
      <div class="code">
		context.<span class="sparkop">jsonFile</span>(<span class="closure">"s3n://..."</span>)<br />&nbsp;&nbsp;.registerTempTable("json")<br />
		results = context.<span class="sparkop">sql</span>(<span class="closure"><br />
			&nbsp;&nbsp;"""SELECT * <br />
			&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;FROM people<br />
			&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;JOIN json ..."""</span>)<br />
	  </div>
      <div class="caption">Query and join different data sources.</div>
    </div>
  </div>
</div>

<div class="row row-padded">
  <div class="col-md-7 col-sm-7">
    <h2>Hive Compatibility</h2>
    <p class="lead">
      Run unmodified Hive queries on existing data.
    </p>
    <p>
      Spark SQL reuses the Hive frontend and metastore, giving you full compatibility with
      existing Hive data, queries, and UDFs. Simply install it alongside Hive.
    </p>
  </div>
  <div class="col-md-5 col-sm-5 col-padded-top col-center">
    <div style="width: 100%; max-width: 323px; display: inline-block">
      <img src="/images/sql-hive-arch.png" style="width: 100%; max-width: 323px;" />
      <div class="caption">Spark SQL can use existing Hive metastores, SerDes, and UDFs.</div>
    </div>
  </div>
</div>

<div class="row row-padded">
  <div class="col-md-7 col-sm-7">
    <h2>Standard Connectivity</h2>
    <p class="lead">
      Connect through JDBC or ODBC.
    </p>
    <p>
      A server mode provides industry standard JDBC and ODBC connectivity for business intelligence tools.
    </p>
  </div>
  <div class="col-md-5 col-sm-5 col-padded-top col-center">
    <div style="width: 100%; max-width: 323px; display: inline-block">
      <img src="/images/jdbc.png" style="width: 75%; max-width: 323px;" />
      <div class="caption">Use your existing BI tools to query big data.</div>
    </div>
  </div>
</div>

<!--
<div class="row row-padded">
  <div class="col-md-7 col-sm-7">
    <h2>Speed</h2>
    <p class="lead">
      Optimized to execute on Spark.
    </p>
    <p>
      Spark SQL was built using the Catalyst optimizer, which automatically rewrites your queries to execute more efficiently.
  	  By leveraging advanced techniques like runtime code generation, Spark SQL makes it easier to write lightning-fast analytic applications.
    </p>
  </div>
  <div class="col-md-5 col-sm-5 col-padded-top col-center">
    <div style="width: 100%; max-width: 272px; display: inline-block; text-align: center;">
      <img src="/images/sqlperf.png" style="width: 100%; max-width: 250px;">
      <div class="caption" style="min-width: 272px;">Performance comparison between Shark and Spark SQL</div>
    </div>
  </div>
</div>
-->

<div class="row">
  <div class="col-md-4 col-padded">
    <h3>Performance &amp; Scalability</h3>
    <p>
      Spark SQL includes a cost-based optimizer, columnar storage and code generation to make queries fast.
      At the same time, it scales to thousands of nodes and multi hour queries using the Spark engine, which provides full mid-query fault tolerance.
      Don't worry about using a different engine for historical data.
    </p>
  </div>

  <div class="col-md-4 col-padded">
    <h3>Community</h3>
    <p>
      Spark SQL is developed as part of Apache Spark. It thus gets
      tested and updated with each Spark release.
    </p>
    <p>
      If you have questions about the system, ask on the
      <a href="/community.html#mailing-lists">Spark mailing lists</a>.
    </p>
    <p>
      The Spark SQL developers welcome contributions. If you'd like to help out,
      read <a href="https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark">how to
      contribute to Spark</a>, and send us a patch!
    </p>
  </div>

  <div class="col-md-4 col-padded">
    <h3>Getting Started</h3>
    <p>
      To get started with Spark SQL:
    </p>
    <ul class="list-narrow">
      <li><a href="/downloads.html">Download Spark</a>. It includes Spark SQL as a module.</li>
      <li>Read the <a href="/docs/latest/sql-programming-guide.html">Spark SQL and DataFrame guide</a> to learn the API.</li>
    </ul>
  </div>
</div>

<div class="row">
  <div class="col-sm-12 col-center">
    <a href="/downloads.html" class="btn btn-success btn-lg btn-multiline">
      Download Apache Spark<br /><span class="small">Includes Spark SQL</span>
    </a>
  </div>
</div>

  </div>
</div>



<footer class="small">
  <hr>
  Apache Spark, Spark, Apache, and the Spark logo are <a href="https://www.apache.org/foundation/marks/">trademarks</a> of
  <a href="http://www.apache.org">The Apache Software Foundation</a>.
</footer>

</div>

</body>
</html>