summaryrefslogtreecommitdiff
path: root/site/sql
diff options
context:
space:
mode:
authorMatei Alexandru Zaharia <matei@apache.org>2015-07-25 23:10:48 +0000
committerMatei Alexandru Zaharia <matei@apache.org>2015-07-25 23:10:48 +0000
commitf4fb827ef5aa831ace6f0ce21d6b02e83f409b63 (patch)
tree9c84b511d584f0f9cd3500f6a887fc92d8348955 /site/sql
parent2de4e60511dad1ec7e4ac3974b14dcf85faaad50 (diff)
downloadspark-website-f4fb827ef5aa831ace6f0ce21d6b02e83f409b63.tar.gz
spark-website-f4fb827ef5aa831ace6f0ce21d6b02e83f409b63.tar.bz2
spark-website-f4fb827ef5aa831ace6f0ce21d6b02e83f409b63.zip
Updates to SQL page
Diffstat (limited to 'site/sql')
-rw-r--r--site/sql/index.html33
1 files changed, 15 insertions, 18 deletions
diff --git a/site/sql/index.html b/site/sql/index.html
index e14fc7171..1f1b6ab0c 100644
--- a/site/sql/index.html
+++ b/site/sql/index.html
@@ -6,7 +6,7 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>
- Spark SQL | Apache Spark
+ Spark SQL &amp; DataFrames | Apache Spark
</title>
@@ -93,7 +93,7 @@
Libraries <b class="caret"></b>
</a>
<ul class="dropdown-menu">
- <li><a href="/sql/">Spark SQL</a></li>
+ <li><a href="/sql/">SQL and DataFrames</a></li>
<li><a href="/streaming/">Spark Streaming</a></li>
<li><a href="/mllib/">MLlib (machine learning)</a></li>
<li><a href="/graphx/">GraphX (graph)</a></li>
@@ -160,7 +160,7 @@
Built-in Libraries:
</p>
<ul class="list-none">
- <li><a href="/sql/">Spark SQL</a></li>
+ <li><a href="/sql/">SQL and DataFrames</a></li>
<li><a href="/streaming/">Spark Streaming</a></li>
<li><a href="/mllib/">MLlib (machine learning)</a></li>
<li><a href="/graphx/">GraphX (graph)</a></li>
@@ -181,8 +181,7 @@
Seamlessly mix SQL queries with Spark programs.
</p>
<p>
- Spark SQL lets you query structured data as a distributed dataset (RDD) in Spark, with integrated APIs in Python, Scala and Java.
- This tight integration makes it easy to run SQL queries alongside complex analytic algorithms.
+ Spark SQL lets you query structured data inside Spark programs, using either SQL or a familiar <a href="/docs/latest/sql-programming-guide.html">DataFrame API</a>. Usable in Java, Scala, Python and R.
</p>
</div>
<div class="col-md-5 col-sm-5 col-padded-top col-center">
@@ -200,19 +199,19 @@
<div class="row row-padded">
<div class="col-md-7 col-sm-7">
- <h2>Unified Data Access</h2>
+ <h2>Uniform Data Access</h2>
<p class="lead">
- Load and query data from a variety of sources.
+ Connect to any data source the same way.
</p>
<p>
- SchemaRDDs provide a single interface for efficiently working with structured data, including Apache Hive tables, parquet files and JSON files.
+ DataFrames and SQL provide a common way to access a variety of data sources, including Hive, Avro, Parquet, ORC, JSON, and JDBC. You can even join data across these sources.
</p>
</div>
<div class="col-md-5 col-sm-5 col-padded-top col-center">
<div style="margin-top: 15px; text-align: left; display: inline-block;">
<div class="code">
sqlCtx.<span class="sparkop">jsonFile</span>(<span class="closure">"s3n://..."</span>)<br />&nbsp;&nbsp;.registerAsTable("json")<br />
- schema_rdd = sqlCtx.<span class="sparkop">sql</span>(<span class="closure">"""<br />
+ results = sqlCtx.<span class="sparkop">sql</span>(<span class="closure">"""<br />
&nbsp;&nbsp;SELECT * <br />
&nbsp;&nbsp;FROM hiveTable<br />
&nbsp;&nbsp;JOIN json ..."""</span>)<br />
@@ -226,7 +225,7 @@
<div class="col-md-7 col-sm-7">
<h2>Hive Compatibility</h2>
<p class="lead">
- Run unmodified Hive queries on existing warehouses.
+ Run unmodified Hive queries on existing data.
</p>
<p>
Spark SQL reuses the Hive frontend and metastore, giving you full compatibility with
@@ -248,7 +247,7 @@
Connect through JDBC or ODBC.
</p>
<p>
- Spark SQL includes a server mode with industry standard JDBC and ODBC connectivity.
+ A server mode provides industry standard JDBC and ODBC connectivity for business intelligence tools.
</p>
</div>
<div class="col-md-5 col-sm-5 col-padded-top col-center">
@@ -288,13 +287,11 @@
<div class="row">
<div class="col-md-4 col-padded">
- <h3>Scalability</h3>
+ <h3>Performance &amp; Scalability</h3>
<p>
- Use the same engine for both interactive and long queries.
- </p>
- <p>
- Spark SQL takes advantage of the RDD model to support mid-query fault tolerance, letting it scale to large jobs too.
- Don't worry about using a different engine for historical data.
+ Spark SQL includes a cost-based optimizer, columnar storage and code generation to make queries fast.
+ At the same time, it scales to thousands of nodes and multi hour queries using the Spark engine, which provides full mid-query fault tolerance.
+ Don't worry about using a different engine for historical data.
</p>
</div>
@@ -322,7 +319,7 @@
</p>
<ul class="list-narrow">
<li><a href="/downloads.html">Download Spark</a>. It includes Spark SQL as a module.</li>
- <li>Read the <a href="/docs/latest/sql-programming-guide.html">Spark SQL programming guide</a>, which includes a examples of common use cases.</li>
+ <li>Read the <a href="/docs/latest/sql-programming-guide.html">Spark SQL and DataFrame guide</a> to learn the API.</li>
</ul>
</div>
</div>