summaryrefslogtreecommitdiff
path: root/sql/index.md
diff options
context:
space:
mode:
authorMichael Armbrust <marmbrus@apache.org>2014-07-01 21:42:50 +0000
committerMichael Armbrust <marmbrus@apache.org>2014-07-01 21:42:50 +0000
commite72b3d31c82f5bd4cf30675636dc04eb75ea47df (patch)
treebdd1b305a0ff44904add3e752349edaed3198c0e /sql/index.md
parent477178c7db5c8f4418c0d1e8dcdab59ca0fa9435 (diff)
downloadspark-website-e72b3d31c82f5bd4cf30675636dc04eb75ea47df.tar.gz
spark-website-e72b3d31c82f5bd4cf30675636dc04eb75ea47df.tar.bz2
spark-website-e72b3d31c82f5bd4cf30675636dc04eb75ea47df.zip
Update site with Spark SQL
Diffstat (limited to 'sql/index.md')
-rw-r--r--sql/index.md172
1 files changed, 172 insertions, 0 deletions
diff --git a/sql/index.md b/sql/index.md
new file mode 100644
index 000000000..069cc5d8e
--- /dev/null
+++ b/sql/index.md
@@ -0,0 +1,172 @@
+---
+layout: global
+type: "page singular"
+title: Spark SQL
+subproject: SQL
+---
+
+
+<div class="jumbotron">
+ <b>Spark SQL</b> unifies access to structured data.
+</div>
+
+
+<div class="row row-padded">
+ <div class="col-md-7 col-sm-7">
+ <h2>Integrated</h2>
+ <p class="lead">
+ Seemlessly mix SQL queries with Spark programs.
+ </p>
+ <p>
+ Spark SQL lets you query structured data as a distributed dataset (RDD) in Spark, with integrated APIs in Python, Scala and Java.
+ This tight integration makes it easy to run SQL queries alongside complex analytic algorithms.
+ </p>
+ </div>
+ <div class="col-md-5 col-sm-5 col-padded-top col-center">
+
+ <div style="margin-top: 15px; text-align: left; display: inline-block;">
+ <div class="code">
+ sqlCtx = new <span class="sparkop">HiveContext</span>(sc)<br/>
+ results = sqlCtx.<span class="sparkop">sql</span>(<br/>&nbsp;&nbsp;<span class="closure">"SELECT * FROM people"</span>)<br/>
+ names = results.<span class="sparkop">map</span>(<span class="closure">lambda p: p.name</span>)</br>
+ </div>
+ <div class="caption">Apply functions to results of SQL queries.</div>
+ </div>
+ </div>
+</div>
+
+<div class="row row-padded">
+ <div class="col-md-7 col-sm-7">
+ <h2>Unified Data Access</h2>
+ <p class="lead">
+ Load and query data from a variety of sources.
+ </p>
+ <p>
+ SchemaRDDs provide a single interface for efficiently working with structured data, including Apache Hive tables, parquet files and JSON files.
+ </p>
+ </div>
+ <div class="col-md-5 col-sm-5 col-padded-top col-center">
+ <div style="margin-top: 15px; text-align: left; display: inline-block;">
+ <div class="code">
+ sqlCtx.<span class="sparkop">jsonFile</span>(<span class="closure">"s3n://..."</span>)<br/>&nbsp;&nbsp;.registerAsTable("json")<br/>
+ schema_rdd = sqlCtx.<span class="sparkop">sql</span>(<span class="closure">"""<br/>
+ &nbsp;&nbsp;SELECT * <br/>
+ &nbsp;&nbsp;FROM hiveTable<br/>
+ &nbsp;&nbsp;JOIN json ..."""</span>)<br/>
+ </div>
+ <div class="caption">Query and join different data sources.</div>
+ </div>
+ </div>
+</div>
+
+<div class="row row-padded">
+ <div class="col-md-7 col-sm-7">
+ <h2>Hive Compatibility</h2>
+ <p class="lead">
+ Run unmodified Hive queries on existing warehouses.
+ </p>
+ <p>
+ Spark SQL reuses the Hive frontend and metastore, giving you full compatibility with
+ existing Hive data, queries, and UDFs. Simply install it alongside Hive.
+ </p>
+ </div>
+ <div class="col-md-5 col-sm-5 col-padded-top col-center">
+ <div style="width: 100%; max-width: 323px; display: inline-block">
+ <img src="{{site.url}}images/sql-hive-arch.png" style="width: 100%; max-width: 323px;">
+ <div class="caption">Spark SQL can use existing Hive metastores, SerDes, and UDFs.</div>
+ </div>
+ </div>
+</div>
+
+<div class="row row-padded">
+ <div class="col-md-7 col-sm-7">
+ <h2>Standard Connectivity</h2>
+ <p class="lead">
+ Connect through JDBC or ODBC.
+ </p>
+ <p>
+ Spark SQL includes a server mode with industry standard JDBC and ODBC connectivity.
+ </p>
+ </div>
+ <div class="col-md-5 col-sm-5 col-padded-top col-center">
+ <div style="width: 100%; max-width: 323px; display: inline-block">
+ <img src="{{site.url}}images/jdbc.png" style="width: 75%; max-width: 323px;">
+ <div class="caption">Use your existing BI tools to query big data.</div>
+ </div>
+ </div>
+</div>
+
+<!--
+<div class="row row-padded">
+ <div class="col-md-7 col-sm-7">
+ <h2>Speed</h2>
+ <p class="lead">
+ Optimized to execute on Spark.
+ </p>
+ <p>
+ Spark SQL was built using the Catalyst optimizer, which automatically rewrites your queries to execute more efficiently.
+ By leveraging advanced techniques like runtime code generation, Spark SQL makes it easier to write lightning-fast analytic applications.
+ </p>
+ </div>
+ <div class="col-md-5 col-sm-5 col-padded-top col-center">
+ <div style="width: 100%; max-width: 272px; display: inline-block; text-align: center;">
+ <img src="{{site.url}}images/sqlperf.png" style="width: 100%; max-width: 250px;">
+ <div class="caption" style="min-width: 272px;">Performance comparison between Shark and Spark SQL</div>
+ </div>
+ </div>
+</div>
+-->
+
+{% extra %}
+
+
+<div class="row">
+ <div class="col-md-4 col-padded">
+ <h3>Scalability</h3>
+ <p>
+ Use the same engine for both interactive and long queries.
+ </p>
+ <p>
+ Spark SQL takes advantage of the RDD model to support mid-query fault tolerance, letting it scale to large jobs too.
+ Don't worry about using a different engine for historical data.
+ </p>
+ </div>
+
+ <div class="col-md-4 col-padded">
+ <h3>Community</h3>
+ <p>
+ Spark SQL is developed as part of Apache Spark. It thus gets
+ tested and updated with each Spark release.
+ </p>
+ <p>
+ If you have questions about the system, ask on the
+ <a href="{{site.url}}community.html#mailing-lists">Spark mailing lists</a>.
+ </p>
+ <p>
+ The Spark SQL developers welcome contributions. If you'd like to help out,
+ read <a href="https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark">how to
+ contribute to Spark</a>, and send us a patch!
+ </p>
+ </div>
+
+ <div class="col-md-4 col-padded">
+ <h3>Getting Started</h3>
+ <p>
+ To get started with Spark SQL:
+ </p>
+ <ul class="list-narrow">
+ <li><a href="{{site.url}}downloads.html">Download Spark</a>. It includes Spark SQL as a module.</li>
+ <li>Read the <a href="{{site.url}}docs/latest/sql-programming-guide.html">Spark SQL programming guide</a>, which includes a examples of common use cases.</li>
+ </ul>
+ </div>
+</div>
+
+<div class="row">
+ <div class="col-sm-12 col-center">
+ <a href="{{site.url}}downloads.html" class="btn btn-success btn-lg btn-multiline">
+ Download Spark<br/><span class="small">Includes Spark SQL</span>
+ </a>
+ </div>
+</div>
+
+{% endextra %}