aboutsummaryrefslogtreecommitdiff
path: root/docs/sql-programming-guide.md
diff options
context:
space:
mode:
authorTakeshi YAMAMURO <linguin.m.s@gmail.com>2016-06-21 14:27:16 +0800
committerCheng Lian <lian@databricks.com>2016-06-21 14:27:16 +0800
commit41e0ffb19f678e9b1e87f747a5e4e3d44964e39a (patch)
treecfe45898cdd0d4274850268b84c2d205daa000e9 /docs/sql-programming-guide.md
parent58f6e27dd70f476f99ac8204e6b405bced4d6de1 (diff)
downloadspark-41e0ffb19f678e9b1e87f747a5e4e3d44964e39a.tar.gz
spark-41e0ffb19f678e9b1e87f747a5e4e3d44964e39a.tar.bz2
spark-41e0ffb19f678e9b1e87f747a5e4e3d44964e39a.zip
[SPARK-15894][SQL][DOC] Update docs for controlling #partitions
## What changes were proposed in this pull request? Update docs for two parameters `spark.sql.files.maxPartitionBytes` and `spark.sql.files.openCostInBytes ` in Other Configuration Options. ## How was this patch tested? N/A Author: Takeshi YAMAMURO <linguin.m.s@gmail.com> Closes #13797 from maropu/SPARK-15894-2.
Diffstat (limited to 'docs/sql-programming-guide.md')
-rw-r--r--docs/sql-programming-guide.md17
1 files changed, 17 insertions, 0 deletions
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 4206f730de..ddf8f701ca 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -2016,6 +2016,23 @@ that these options will be deprecated in future release as more optimizations ar
<table class="table">
<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
<tr>
+ <td><code>spark.sql.files.maxPartitionBytes</code></td>
+ <td>134217728 (128 MB)</td>
+ <td>
+ The maximum number of bytes to pack into a single partition when reading files.
+ </td>
+ </tr>
+ <tr>
+ <td><code>spark.sql.files.openCostInBytes</code></td>
+ <td>4194304 (4 MB)</td>
+ <td>
+ The estimated cost to open a file, measured by the number of bytes could be scanned in the same
+ time. This is used when putting multiple files into a partition. It is better to over estimated,
+ then the partitions with small files will be faster than partitions with bigger files (which is
+ scheduled first).
+ </td>
+ </tr>
+ <tr>
<td><code>spark.sql.autoBroadcastJoinThreshold</code></td>
<td>10485760 (10 MB)</td>
<td>