aboutsummaryrefslogtreecommitdiff
path: root/sql/catalyst
diff options
context:
space:
mode:
authorCheng Lian <lian@databricks.com>2014-12-17 12:48:04 -0800
committerMichael Armbrust <michael@databricks.com>2014-12-17 12:48:04 -0800
commit62771353767b5eecf2ec6c732cab07369d784df5 (patch)
treecf1edff330e6af110c065ffcf4d6bce80f179733 /sql/catalyst
parent7ad579ee972987863c09827554a6330aa54433b1 (diff)
downloadspark-62771353767b5eecf2ec6c732cab07369d784df5.tar.gz
spark-62771353767b5eecf2ec6c732cab07369d784df5.tar.bz2
spark-62771353767b5eecf2ec6c732cab07369d784df5.zip
[SPARK-4493][SQL] Don't pushdown Eq, NotEq, Lt, LtEq, Gt and GtEq predicates with nulls for Parquet
Predicates like `a = NULL` and `a < NULL` can't be pushed down since Parquet `Lt`, `LtEq`, `Gt`, `GtEq` doesn't accept null value. Note that `Eq` and `NotEq` can only be used with `null` to represent predicates like `a IS NULL` and `a IS NOT NULL`. However, normally this issue doesn't cause NPE because any value compared to `NULL` results `NULL`, and Spark SQL automatically optimizes out `NULL` predicate in the `SimplifyFilters` rule. Only testing code that intentionally disables the optimizer may trigger this issue. (That's why this issue is not marked as blocker and I do **NOT** think we need to backport this to branch-1.1 This PR restricts `Lt`, `LtEq`, `Gt` and `GtEq` to non-null values only, and only uses `Eq` with null value to pushdown `IsNull` and `IsNotNull`. Also, added support for Parquet `NotEq` filter for completeness and (tiny) performance gain, it's also used to pushdown `IsNotNull`. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3367) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #3367 from liancheng/filters-with-null and squashes the following commits: cc41281 [Cheng Lian] Fixes several styling issues de7de28 [Cheng Lian] Adds stricter rules for Parquet filters with null
Diffstat (limited to 'sql/catalyst')
-rw-r--r--sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala9
1 files changed, 9 insertions, 0 deletions
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
index 93c1932515..94e1d37c1c 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
@@ -42,6 +42,15 @@ object Literal {
}
/**
+ * An extractor that matches non-null literal values
+ */
+object NonNullLiteral {
+ def unapply(literal: Literal): Option[(Any, DataType)] = {
+ Option(literal.value).map(_ => (literal.value, literal.dataType))
+ }
+}
+
+/**
* Extractor for retrieving Int literals.
*/
object IntegerLiteral {