[SPARK-4258][SQL][DOC] Documents spark.sql.parquet.filterPushdown

Documents `spark.sql.parquet.filterPushdown`, explains why it's turned off by default and when it's safe to be turned on.  [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3440)  Author: Cheng Lian <lian@databricks.com> Closes #3440 from liancheng/parquet-filter-pushdown-doc and squashes the following commits: 2104311 [Cheng Lian] Documents spark.sql.parquet.filterPushdown (cherry picked from commit 5db8dcaf494e0dffed4fc22f19b0334d95ab6bfb) Signed-off-by: Michael Armbrust <michael@databricks.com>
author: Cheng Lian <lian@databricks.com> 2014-12-01 13:09:51 -0800
committer: Michael Armbrust <michael@databricks.com> 2014-12-01 13:10:20 -0800
commit: 9c9b4bd1e4ac40c4abf4b5d1113c3056732e2c25 (patch)
tree: c23023109e73e5bb0a791748c984e3a8096fcabc /docs
parent: 35bc338c04022354654435427bb310acdcb9904a (diff)
download: spark-9c9b4bd1e4ac40c4abf4b5d1113c3056732e2c25.tar.gz
spark-9c9b4bd1e4ac40c4abf4b5d1113c3056732e2c25.tar.bz2
spark-9c9b4bd1e4ac40c4abf4b5d1113c3056732e2c25.zip
1 files changed, 16 insertions, 6 deletions
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 24a68bb083..96a3209c52 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -146,7 +146,7 @@ describes the various methods for loading data into a SchemaRDD.
 
 Spark SQL supports two different methods for converting existing RDDs into SchemaRDDs.  The first
 method uses reflection to infer the schema of an RDD that contains specific types of objects.  This
-reflection based approach leads to more concise code and works well when you already know the schema 
+reflection based approach leads to more concise code and works well when you already know the schema
 while writing your Spark application.
 
 The second method for creating SchemaRDDs is through a programmatic interface that allows you to
@@ -566,7 +566,7 @@ for teenName in teenNames.collect():
 
 ### Configuration
 
-Configuration of Parquet can be done using the `setConf` method on SQLContext or by running 
+Configuration of Parquet can be done using the `setConf` method on SQLContext or by running
 `SET key=value` commands using SQL.
 
 <table class="table">
@@ -575,8 +575,8 @@ Configuration of Parquet can be done using the `setConf` method on SQLContext or
   <td><code>spark.sql.parquet.binaryAsString</code></td>
   <td>false</td>
   <td>
-    Some other Parquet-producing systems, in particular Impala and older versions of Spark SQL, do 
-    not differentiate between binary data and strings when writing out the Parquet schema.  This 
+    Some other Parquet-producing systems, in particular Impala and older versions of Spark SQL, do
+    not differentiate between binary data and strings when writing out the Parquet schema.  This
     flag tells Spark SQL to interpret binary data as a string to provide compatibility with these systems.
   </td>
 </tr>
@@ -591,11 +591,21 @@ Configuration of Parquet can be done using the `setConf` method on SQLContext or
   <td><code>spark.sql.parquet.compression.codec</code></td>
   <td>gzip</td>
   <td>
-    Sets the compression codec use when writing Parquet files. Acceptable values include: 
+    Sets the compression codec use when writing Parquet files. Acceptable values include:
     uncompressed, snappy, gzip, lzo.
   </td>
 </tr>
 <tr>
+  <td><code>spark.sql.parquet.filterPushdown</code></td>
+  <td>false</td>
+  <td>
+    Turn on Parquet filter pushdown optimization. This feature is turned off by default because of a known
+    bug in Paruet 1.6.0rc3 (<a href="https://issues.apache.org/jira/browse/PARQUET-136">PARQUET-136</a>).
+    However, if your table doesn't contain any nullable string or binary columns, it's still safe to turn
+    this feature on.
+  </td>
+</tr>
+<tr>
   <td><code>spark.sql.hive.convertMetastoreParquet</code></td>
   <td>true</td>
   <td>
@@ -945,7 +955,7 @@ options.
 
 ## Migration Guide for Shark User
 
-### Scheduling 
+### Scheduling
 To set a [Fair Scheduler](job-scheduling.html#fair-scheduler-pools) pool for a JDBC client session,
 users can set the `spark.sql.thriftserver.scheduler.pool` variable:
author	Cheng Lian <lian@databricks.com>	2014-12-01 13:09:51 -0800
committer	Michael Armbrust <michael@databricks.com>	2014-12-01 13:10:20 -0800
commit	9c9b4bd1e4ac40c4abf4b5d1113c3056732e2c25 (patch)
tree	c23023109e73e5bb0a791748c984e3a8096fcabc /docs
parent	35bc338c04022354654435427bb310acdcb9904a (diff)
download	spark-9c9b4bd1e4ac40c4abf4b5d1113c3056732e2c25.tar.gz spark-9c9b4bd1e4ac40c4abf4b5d1113c3056732e2c25.tar.bz2 spark-9c9b4bd1e4ac40c4abf4b5d1113c3056732e2c25.zip