aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorCheng Lian <lian@databricks.com>2014-12-01 13:09:51 -0800
committerMichael Armbrust <michael@databricks.com>2014-12-01 13:10:20 -0800
commit9c9b4bd1e4ac40c4abf4b5d1113c3056732e2c25 (patch)
treec23023109e73e5bb0a791748c984e3a8096fcabc /docs
parent35bc338c04022354654435427bb310acdcb9904a (diff)
downloadspark-9c9b4bd1e4ac40c4abf4b5d1113c3056732e2c25.tar.gz
spark-9c9b4bd1e4ac40c4abf4b5d1113c3056732e2c25.tar.bz2
spark-9c9b4bd1e4ac40c4abf4b5d1113c3056732e2c25.zip
[SPARK-4258][SQL][DOC] Documents spark.sql.parquet.filterPushdown
Documents `spark.sql.parquet.filterPushdown`, explains why it's turned off by default and when it's safe to be turned on. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3440) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #3440 from liancheng/parquet-filter-pushdown-doc and squashes the following commits: 2104311 [Cheng Lian] Documents spark.sql.parquet.filterPushdown (cherry picked from commit 5db8dcaf494e0dffed4fc22f19b0334d95ab6bfb) Signed-off-by: Michael Armbrust <michael@databricks.com>
Diffstat (limited to 'docs')
-rw-r--r--docs/sql-programming-guide.md22
1 files changed, 16 insertions, 6 deletions
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 24a68bb083..96a3209c52 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -146,7 +146,7 @@ describes the various methods for loading data into a SchemaRDD.
Spark SQL supports two different methods for converting existing RDDs into SchemaRDDs. The first
method uses reflection to infer the schema of an RDD that contains specific types of objects. This
-reflection based approach leads to more concise code and works well when you already know the schema
+reflection based approach leads to more concise code and works well when you already know the schema
while writing your Spark application.
The second method for creating SchemaRDDs is through a programmatic interface that allows you to
@@ -566,7 +566,7 @@ for teenName in teenNames.collect():
### Configuration
-Configuration of Parquet can be done using the `setConf` method on SQLContext or by running
+Configuration of Parquet can be done using the `setConf` method on SQLContext or by running
`SET key=value` commands using SQL.
<table class="table">
@@ -575,8 +575,8 @@ Configuration of Parquet can be done using the `setConf` method on SQLContext or
<td><code>spark.sql.parquet.binaryAsString</code></td>
<td>false</td>
<td>
- Some other Parquet-producing systems, in particular Impala and older versions of Spark SQL, do
- not differentiate between binary data and strings when writing out the Parquet schema. This
+ Some other Parquet-producing systems, in particular Impala and older versions of Spark SQL, do
+ not differentiate between binary data and strings when writing out the Parquet schema. This
flag tells Spark SQL to interpret binary data as a string to provide compatibility with these systems.
</td>
</tr>
@@ -591,11 +591,21 @@ Configuration of Parquet can be done using the `setConf` method on SQLContext or
<td><code>spark.sql.parquet.compression.codec</code></td>
<td>gzip</td>
<td>
- Sets the compression codec use when writing Parquet files. Acceptable values include:
+ Sets the compression codec use when writing Parquet files. Acceptable values include:
uncompressed, snappy, gzip, lzo.
</td>
</tr>
<tr>
+ <td><code>spark.sql.parquet.filterPushdown</code></td>
+ <td>false</td>
+ <td>
+ Turn on Parquet filter pushdown optimization. This feature is turned off by default because of a known
+ bug in Paruet 1.6.0rc3 (<a href="https://issues.apache.org/jira/browse/PARQUET-136">PARQUET-136</a>).
+ However, if your table doesn't contain any nullable string or binary columns, it's still safe to turn
+ this feature on.
+ </td>
+</tr>
+<tr>
<td><code>spark.sql.hive.convertMetastoreParquet</code></td>
<td>true</td>
<td>
@@ -945,7 +955,7 @@ options.
## Migration Guide for Shark User
-### Scheduling
+### Scheduling
To set a [Fair Scheduler](job-scheduling.html#fair-scheduler-pools) pool for a JDBC client session,
users can set the `spark.sql.thriftserver.scheduler.pool` variable: