diff options
author | Lianhui Wang <lianhuiwang09@gmail.com> | 2016-07-12 18:52:15 +0200 |
---|---|---|
committer | Herman van Hovell <hvanhovell@databricks.com> | 2016-07-12 18:52:15 +0200 |
commit | 5ad68ba5ce625c7005b540ca50ed001ca18de967 (patch) | |
tree | fb8d1e7f11c9ac6f2c4a89a8b384a702d489c6a5 /docs/sql-programming-guide.md | |
parent | 6cb75db9ab1a4f227069bec2763b89546b88b0ee (diff) | |
download | spark-5ad68ba5ce625c7005b540ca50ed001ca18de967.tar.gz spark-5ad68ba5ce625c7005b540ca50ed001ca18de967.tar.bz2 spark-5ad68ba5ce625c7005b540ca50ed001ca18de967.zip |
[SPARK-15752][SQL] Optimize metadata only query that has an aggregate whose children are deterministic project or filter operators.
## What changes were proposed in this pull request?
when query only use metadata (example: partition key), it can return results based on metadata without scanning files. Hive did it in HIVE-1003.
## How was this patch tested?
add unit tests
Author: Lianhui Wang <lianhuiwang09@gmail.com>
Author: Wenchen Fan <wenchen@databricks.com>
Author: Lianhui Wang <lianhuiwang@users.noreply.github.com>
Closes #13494 from lianhuiwang/metadata-only.
Diffstat (limited to 'docs/sql-programming-guide.md')
-rw-r--r-- | docs/sql-programming-guide.md | 12 |
1 files changed, 12 insertions, 0 deletions
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md index 448251cfdc..e838a13af7 100644 --- a/docs/sql-programming-guide.md +++ b/docs/sql-programming-guide.md @@ -1376,6 +1376,18 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession </p> </td> </tr> +<tr> + <td><code>spark.sql.optimizer.metadataOnly</code></td> + <td>true</td> + <td> + <p> + When true, enable the metadata-only query optimization that use the table's metadata to + produce the partition columns instead of table scans. It applies when all the columns scanned + are partition columns and the query has an aggregate operator that satisfies distinct + semantics. + </p> + </td> +</tr> </table> ## JSON Datasets |