diff options
author | Cheolsoo Park <cheolsoop@netflix.com> | 2015-07-13 19:45:10 -0700 |
---|---|---|
committer | Michael Armbrust <michael@databricks.com> | 2015-07-13 19:45:10 -0700 |
commit | 408b384de96b9dbe94945753f7947fbe84272ae1 (patch) | |
tree | da8d34af816eea0e75d8c70e83ae5fd954ba71f2 /streaming/pom.xml | |
parent | b7bcbe25f90ba4e78b548465bc80d4de1d2c4a4a (diff) | |
download | spark-408b384de96b9dbe94945753f7947fbe84272ae1.tar.gz spark-408b384de96b9dbe94945753f7947fbe84272ae1.tar.bz2 spark-408b384de96b9dbe94945753f7947fbe84272ae1.zip |
[SPARK-6910] [SQL] Support for pushing predicates down to metastore for partition pruning
This PR supersedes my old one #6921. Since my patch has changed quite a bit, I am opening a new PR to make it easier to review.
The changes include-
* Implement `toMetastoreFilter()` function in `HiveShim` that takes `Seq[Expression]` and converts them into a filter string for Hive metastore.
* This functions matches all the `AttributeReference` + `BinaryComparisonOp` + `Integral/StringType` patterns in `Seq[Expression]` and fold them into a string.
* Change `hiveQlPartitions` field in `MetastoreRelation` to `getHiveQlPartitions()` function that takes a filter string parameter.
* Call `getHiveQlPartitions()` in `HiveTableScan` with a filter string.
But there are some cases in which predicate pushdown is disabled-
Case | Predicate pushdown
------- | -----------------------------
Hive integral and string types | Yes
Hive varchar type | No
Hive 0.13 and newer | Yes
Hive 0.12 and older | No
convertMetastoreParquet=false | Yes
convertMetastoreParquet=true | No
In case of `convertMetastoreParquet=true`, predicates are not pushed down because this conversion happens in an `Analyzer` rule (`HiveMetastoreCatalog.ParquetConversions`). At this point, `HiveTableScan` hasn't run, so predicates are not available. But reading the source code, I think it is intentional to convert the entire Hive table w/ all the partitions into `ParquetRelation` because then `ParquetRelation` can be cached and reused for any query against that table. Please correct me if I am wrong.
cc marmbrus
Author: Cheolsoo Park <cheolsoop@netflix.com>
Closes #7216 from piaozhexiu/SPARK-6910-2 and squashes the following commits:
aa1490f [Cheolsoo Park] Fix ordering of imports
c212c4d [Cheolsoo Park] Incorporate review comments
5e93f9d [Cheolsoo Park] Predicate pushdown into Hive metastore
Diffstat (limited to 'streaming/pom.xml')
0 files changed, 0 insertions, 0 deletions