[SPARK-13473][SQL] Simplifies PushPredicateThroughProject - spark

diff options

author	Cheng Lian <lian@databricks.com>	2016-03-22 19:20:56 +0800
committer	Cheng Lian <lian@databricks.com>	2016-03-22 19:20:56 +0800
commit	f2e855fba8eb73475cf312cdf880c1297d4323bb (patch)
tree	2573c5f19c05979757deb990abbbb10dc6c2be6c /python/pyspark/sql
parent	14464cadb9477be8b7f4c891ea990535ab6638ec (diff)
download	spark-f2e855fba8eb73475cf312cdf880c1297d4323bb.tar.gz spark-f2e855fba8eb73475cf312cdf880c1297d4323bb.tar.bz2 spark-f2e855fba8eb73475cf312cdf880c1297d4323bb.zip

[SPARK-13473][SQL] Simplifies PushPredicateThroughProject

## What changes were proposed in this pull request? This is a follow-up of PR #11348. After PR #11348, a predicate is never pushed through a project as long as the project contains any non-deterministic fields. Thus, it's impossible that the candidate filter condition can reference any non-deterministic projected fields, and related logic can be safely cleaned up. To be more specific, the following optimization is allowed: ```scala // From: df.select('a, 'b).filter('c > rand(42)) // To: df.filter('c > rand(42)).select('a, 'b) ``` while this isn't: ```scala // From: df.select('a, rand('b) as 'rb, 'c).filter('c > 'rb) // To: df.filter('c > rand('b)).select('a, rand('b) as 'rb, 'c) ``` ## How was this patch tested? Existing test cases should do the work. Author: Cheng Lian <lian@databricks.com> Closes #11864 from liancheng/spark-13473-cleanup.

Diffstat (limited to 'python/pyspark/sql')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: