aboutsummaryrefslogtreecommitdiff
path: root/sql/catalyst/src/test
diff options
context:
space:
mode:
authorJosh Rosen <joshrosen@databricks.com>2016-09-28 19:03:05 -0700
committerHerman van Hovell <hvanhovell@databricks.com>2016-09-28 19:03:05 -0700
commit37eb9184f1e9f1c07142c66936671f4711ef407d (patch)
treeb1582d97c09ddbc2805ca82e5cdbb905ed18373e /sql/catalyst/src/test
parent7dfad4b132bc46263ef788ced4a935862f5c8756 (diff)
downloadspark-37eb9184f1e9f1c07142c66936671f4711ef407d.tar.gz
spark-37eb9184f1e9f1c07142c66936671f4711ef407d.tar.bz2
spark-37eb9184f1e9f1c07142c66936671f4711ef407d.zip
[SPARK-17712][SQL] Fix invalid pushdown of data-independent filters beneath aggregates
## What changes were proposed in this pull request? This patch fixes a minor correctness issue impacting the pushdown of filters beneath aggregates. Specifically, if a filter condition references no grouping or aggregate columns (e.g. `WHERE false`) then it would be incorrectly pushed beneath an aggregate. Intuitively, the only case where you can push a filter beneath an aggregate is when that filter is deterministic and is defined over the grouping columns / expressions, since in that case the filter is acting to exclude entire groups from the query (like a `HAVING` clause). The existing code would only push deterministic filters beneath aggregates when all of the filter's references were grouping columns, but this logic missed the case where a filter has no references. For example, `WHERE false` is deterministic but is independent of the actual data. This patch fixes this minor bug by adding a new check to ensure that we don't push filters beneath aggregates when those filters don't reference any columns. ## How was this patch tested? New regression test in FilterPushdownSuite. Author: Josh Rosen <joshrosen@databricks.com> Closes #15289 from JoshRosen/SPARK-17712.
Diffstat (limited to 'sql/catalyst/src/test')
-rw-r--r--sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala17
1 files changed, 17 insertions, 0 deletions
diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala
index 55836f96f7..019f132d94 100644
--- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala
+++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala
@@ -687,6 +687,23 @@ class FilterPushdownSuite extends PlanTest {
comparePlans(optimized, correctAnswer)
}
+ test("SPARK-17712: aggregate: don't push down filters that are data-independent") {
+ val originalQuery = LocalRelation.apply(testRelation.output, Seq.empty)
+ .select('a, 'b)
+ .groupBy('a)(count('a))
+ .where(false)
+
+ val optimized = Optimize.execute(originalQuery.analyze)
+
+ val correctAnswer = testRelation
+ .select('a, 'b)
+ .groupBy('a)(count('a))
+ .where(false)
+ .analyze
+
+ comparePlans(optimized, correctAnswer)
+ }
+
test("broadcast hint") {
val originalQuery = BroadcastHint(testRelation)
.where('a === 2L && 'b + Rand(10).as("rnd") === 3)