diff options
author | Cheng Lian <lian@databricks.com> | 2016-03-22 19:20:56 +0800 |
---|---|---|
committer | Cheng Lian <lian@databricks.com> | 2016-03-22 19:20:56 +0800 |
commit | f2e855fba8eb73475cf312cdf880c1297d4323bb (patch) | |
tree | 2573c5f19c05979757deb990abbbb10dc6c2be6c /python | |
parent | 14464cadb9477be8b7f4c891ea990535ab6638ec (diff) | |
download | spark-f2e855fba8eb73475cf312cdf880c1297d4323bb.tar.gz spark-f2e855fba8eb73475cf312cdf880c1297d4323bb.tar.bz2 spark-f2e855fba8eb73475cf312cdf880c1297d4323bb.zip |
[SPARK-13473][SQL] Simplifies PushPredicateThroughProject
## What changes were proposed in this pull request?
This is a follow-up of PR #11348.
After PR #11348, a predicate is never pushed through a project as long as the project contains any non-deterministic fields. Thus, it's impossible that the candidate filter condition can reference any non-deterministic projected fields, and related logic can be safely cleaned up.
To be more specific, the following optimization is allowed:
```scala
// From:
df.select('a, 'b).filter('c > rand(42))
// To:
df.filter('c > rand(42)).select('a, 'b)
```
while this isn't:
```scala
// From:
df.select('a, rand('b) as 'rb, 'c).filter('c > 'rb)
// To:
df.filter('c > rand('b)).select('a, rand('b) as 'rb, 'c)
```
## How was this patch tested?
Existing test cases should do the work.
Author: Cheng Lian <lian@databricks.com>
Closes #11864 from liancheng/spark-13473-cleanup.
Diffstat (limited to 'python')
0 files changed, 0 insertions, 0 deletions