diff options
author | 蒋星博 <jiangxingbo@meituan.com> | 2016-07-14 00:21:27 +0800 |
---|---|---|
committer | Cheng Lian <lian@databricks.com> | 2016-07-14 00:21:27 +0800 |
commit | f376c37268848dbb4b2fb57677e22ef2bf207b49 (patch) | |
tree | bc4fc046291880943c4d2c5ad37625a7548baa84 /licenses/LICENSE-cloudpickle.txt | |
parent | ea06e4ef34c860219a9aeec81816ef53ada96253 (diff) | |
download | spark-f376c37268848dbb4b2fb57677e22ef2bf207b49.tar.gz spark-f376c37268848dbb4b2fb57677e22ef2bf207b49.tar.bz2 spark-f376c37268848dbb4b2fb57677e22ef2bf207b49.zip |
[SPARK-16343][SQL] Improve the PushDownPredicate rule to pushdown predicates correctly in non-deterministic condition.
## What changes were proposed in this pull request?
Currently our Optimizer may reorder the predicates to run them more efficient, but in non-deterministic condition, change the order between deterministic parts and non-deterministic parts may change the number of input rows. For example:
```SELECT a FROM t WHERE rand() < 0.1 AND a = 1```
And
```SELECT a FROM t WHERE a = 1 AND rand() < 0.1```
may call rand() for different times and therefore the output rows differ.
This PR improved this condition by checking whether the predicate is placed before any non-deterministic predicates.
## How was this patch tested?
Expanded related testcases in FilterPushdownSuite.
Author: 蒋星博 <jiangxingbo@meituan.com>
Closes #14012 from jiangxb1987/ppd.
Diffstat (limited to 'licenses/LICENSE-cloudpickle.txt')
0 files changed, 0 insertions, 0 deletions