From 41e0ffb19f678e9b1e87f747a5e4e3d44964e39a Mon Sep 17 00:00:00 2001 From: Takeshi YAMAMURO Date: Tue, 21 Jun 2016 14:27:16 +0800 Subject: [SPARK-15894][SQL][DOC] Update docs for controlling #partitions ## What changes were proposed in this pull request? Update docs for two parameters `spark.sql.files.maxPartitionBytes` and `spark.sql.files.openCostInBytes ` in Other Configuration Options. ## How was this patch tested? N/A Author: Takeshi YAMAMURO Closes #13797 from maropu/SPARK-15894-2. --- docs/sql-programming-guide.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) (limited to 'docs/sql-programming-guide.md') diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md index 4206f730de..ddf8f701ca 100644 --- a/docs/sql-programming-guide.md +++ b/docs/sql-programming-guide.md @@ -2015,6 +2015,23 @@ that these options will be deprecated in future release as more optimizations ar + + + + + + + + + + -- cgit v1.2.3
Property NameDefaultMeaning
spark.sql.files.maxPartitionBytes134217728 (128 MB) + The maximum number of bytes to pack into a single partition when reading files. +
spark.sql.files.openCostInBytes4194304 (4 MB) + The estimated cost to open a file, measured by the number of bytes could be scanned in the same + time. This is used when putting multiple files into a partition. It is better to over estimated, + then the partitions with small files will be faster than partitions with bigger files (which is + scheduled first). +
spark.sql.autoBroadcastJoinThreshold 10485760 (10 MB)