diff options
author | Cheng Lian <lian@databricks.com> | 2014-12-17 12:48:04 -0800 |
---|---|---|
committer | Michael Armbrust <michael@databricks.com> | 2014-12-17 12:48:04 -0800 |
commit | 62771353767b5eecf2ec6c732cab07369d784df5 (patch) | |
tree | cf1edff330e6af110c065ffcf4d6bce80f179733 /sql/catalyst | |
parent | 7ad579ee972987863c09827554a6330aa54433b1 (diff) | |
download | spark-62771353767b5eecf2ec6c732cab07369d784df5.tar.gz spark-62771353767b5eecf2ec6c732cab07369d784df5.tar.bz2 spark-62771353767b5eecf2ec6c732cab07369d784df5.zip |
[SPARK-4493][SQL] Don't pushdown Eq, NotEq, Lt, LtEq, Gt and GtEq predicates with nulls for Parquet
Predicates like `a = NULL` and `a < NULL` can't be pushed down since Parquet `Lt`, `LtEq`, `Gt`, `GtEq` doesn't accept null value. Note that `Eq` and `NotEq` can only be used with `null` to represent predicates like `a IS NULL` and `a IS NOT NULL`.
However, normally this issue doesn't cause NPE because any value compared to `NULL` results `NULL`, and Spark SQL automatically optimizes out `NULL` predicate in the `SimplifyFilters` rule. Only testing code that intentionally disables the optimizer may trigger this issue. (That's why this issue is not marked as blocker and I do **NOT** think we need to backport this to branch-1.1
This PR restricts `Lt`, `LtEq`, `Gt` and `GtEq` to non-null values only, and only uses `Eq` with null value to pushdown `IsNull` and `IsNotNull`. Also, added support for Parquet `NotEq` filter for completeness and (tiny) performance gain, it's also used to pushdown `IsNotNull`.
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3367)
<!-- Reviewable:end -->
Author: Cheng Lian <lian@databricks.com>
Closes #3367 from liancheng/filters-with-null and squashes the following commits:
cc41281 [Cheng Lian] Fixes several styling issues
de7de28 [Cheng Lian] Adds stricter rules for Parquet filters with null
Diffstat (limited to 'sql/catalyst')
-rw-r--r-- | sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala | 9 |
1 files changed, 9 insertions, 0 deletions
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala index 93c1932515..94e1d37c1c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala @@ -42,6 +42,15 @@ object Literal { } /** + * An extractor that matches non-null literal values + */ +object NonNullLiteral { + def unapply(literal: Literal): Option[(Any, DataType)] = { + Option(literal.value).map(_ => (literal.value, literal.dataType)) + } +} + +/** * Extractor for retrieving Int literals. */ object IntegerLiteral { |