aboutsummaryrefslogtreecommitdiff
path: root/docs/sql-programming-guide.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/sql-programming-guide.md')
-rw-r--r--docs/sql-programming-guide.md30
1 files changed, 29 insertions, 1 deletions
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 9107c9b676..2786e3d2cd 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -1348,6 +1348,34 @@ Configuration of Parquet can be done using the `setConf` method on `SQLContext`
support.
</td>
</tr>
+<tr>
+ <td><code>spark.sql.parquet.output.committer.class</code></td>
+ <td><code>org.apache.parquet.hadoop.<br />ParquetOutputCommitter</code></td>
+ <td>
+ <p>
+ The output committer class used by Parquet. The specified class needs to be a subclass of
+ <code>org.apache.hadoop.<br />mapreduce.OutputCommitter</code>. Typically, it's also a
+ subclass of <code>org.apache.parquet.hadoop.ParquetOutputCommitter</code>.
+ </p>
+ <p>
+ <b>Note:</b>
+ <ul>
+ <li>
+ This option must be set via Hadoop <code>Configuration</code> rather than Spark
+ <code>SQLConf</code>.
+ </li>
+ <li>
+ This option overrides <code>spark.sql.sources.<br />outputCommitterClass</code>.
+ </li>
+ </ul>
+ </p>
+ <p>
+ Spark SQL comes with a builtin
+ <code>org.apache.spark.sql.<br />parquet.DirectParquetOutputCommitter</code>, which can be more
+ efficient then the default Parquet output committer when writing data to S3.
+ </p>
+ </td>
+</tr>
</table>
## JSON Datasets
@@ -1876,7 +1904,7 @@ that these options will be deprecated in future release as more optimizations ar
Configures the number of partitions to use when shuffling data for joins or aggregations.
</td>
</tr>
- <tr>
+ <tr>
<td><code>spark.sql.planner.externalSort</code></td>
<td>false</td>
<td>