aboutsummaryrefslogtreecommitdiff
path: root/launcher/src
diff options
context:
space:
mode:
authorCheolsoo Park <cheolsoop@netflix.com>2015-07-13 19:45:10 -0700
committerMichael Armbrust <michael@databricks.com>2015-07-13 19:45:10 -0700
commit408b384de96b9dbe94945753f7947fbe84272ae1 (patch)
treeda8d34af816eea0e75d8c70e83ae5fd954ba71f2 /launcher/src
parentb7bcbe25f90ba4e78b548465bc80d4de1d2c4a4a (diff)
downloadspark-408b384de96b9dbe94945753f7947fbe84272ae1.tar.gz
spark-408b384de96b9dbe94945753f7947fbe84272ae1.tar.bz2
spark-408b384de96b9dbe94945753f7947fbe84272ae1.zip
[SPARK-6910] [SQL] Support for pushing predicates down to metastore for partition pruning
This PR supersedes my old one #6921. Since my patch has changed quite a bit, I am opening a new PR to make it easier to review. The changes include- * Implement `toMetastoreFilter()` function in `HiveShim` that takes `Seq[Expression]` and converts them into a filter string for Hive metastore. * This functions matches all the `AttributeReference` + `BinaryComparisonOp` + `Integral/StringType` patterns in `Seq[Expression]` and fold them into a string. * Change `hiveQlPartitions` field in `MetastoreRelation` to `getHiveQlPartitions()` function that takes a filter string parameter. * Call `getHiveQlPartitions()` in `HiveTableScan` with a filter string. But there are some cases in which predicate pushdown is disabled- Case | Predicate pushdown ------- | ----------------------------- Hive integral and string types | Yes Hive varchar type | No Hive 0.13 and newer | Yes Hive 0.12 and older | No convertMetastoreParquet=false | Yes convertMetastoreParquet=true | No In case of `convertMetastoreParquet=true`, predicates are not pushed down because this conversion happens in an `Analyzer` rule (`HiveMetastoreCatalog.ParquetConversions`). At this point, `HiveTableScan` hasn't run, so predicates are not available. But reading the source code, I think it is intentional to convert the entire Hive table w/ all the partitions into `ParquetRelation` because then `ParquetRelation` can be cached and reused for any query against that table. Please correct me if I am wrong. cc marmbrus Author: Cheolsoo Park <cheolsoop@netflix.com> Closes #7216 from piaozhexiu/SPARK-6910-2 and squashes the following commits: aa1490f [Cheolsoo Park] Fix ordering of imports c212c4d [Cheolsoo Park] Incorporate review comments 5e93f9d [Cheolsoo Park] Predicate pushdown into Hive metastore
Diffstat (limited to 'launcher/src')
0 files changed, 0 insertions, 0 deletions