diff options
author | Josh Rosen <joshrosen@databricks.com> | 2016-09-28 19:03:05 -0700 |
---|---|---|
committer | Herman van Hovell <hvanhovell@databricks.com> | 2016-09-28 19:03:05 -0700 |
commit | 37eb9184f1e9f1c07142c66936671f4711ef407d (patch) | |
tree | b1582d97c09ddbc2805ca82e5cdbb905ed18373e /external/docker | |
parent | 7dfad4b132bc46263ef788ced4a935862f5c8756 (diff) | |
download | spark-37eb9184f1e9f1c07142c66936671f4711ef407d.tar.gz spark-37eb9184f1e9f1c07142c66936671f4711ef407d.tar.bz2 spark-37eb9184f1e9f1c07142c66936671f4711ef407d.zip |
[SPARK-17712][SQL] Fix invalid pushdown of data-independent filters beneath aggregates
## What changes were proposed in this pull request?
This patch fixes a minor correctness issue impacting the pushdown of filters beneath aggregates. Specifically, if a filter condition references no grouping or aggregate columns (e.g. `WHERE false`) then it would be incorrectly pushed beneath an aggregate.
Intuitively, the only case where you can push a filter beneath an aggregate is when that filter is deterministic and is defined over the grouping columns / expressions, since in that case the filter is acting to exclude entire groups from the query (like a `HAVING` clause). The existing code would only push deterministic filters beneath aggregates when all of the filter's references were grouping columns, but this logic missed the case where a filter has no references. For example, `WHERE false` is deterministic but is independent of the actual data.
This patch fixes this minor bug by adding a new check to ensure that we don't push filters beneath aggregates when those filters don't reference any columns.
## How was this patch tested?
New regression test in FilterPushdownSuite.
Author: Josh Rosen <joshrosen@databricks.com>
Closes #15289 from JoshRosen/SPARK-17712.
Diffstat (limited to 'external/docker')
0 files changed, 0 insertions, 0 deletions