aboutsummaryrefslogtreecommitdiff
path: root/python/pyspark/mllib/fpm.py
diff options
context:
space:
mode:
author蒋星博 <jiangxingbo@meituan.com>2016-07-14 00:21:27 +0800
committerCheng Lian <lian@databricks.com>2016-07-14 00:21:27 +0800
commitf376c37268848dbb4b2fb57677e22ef2bf207b49 (patch)
treebc4fc046291880943c4d2c5ad37625a7548baa84 /python/pyspark/mllib/fpm.py
parentea06e4ef34c860219a9aeec81816ef53ada96253 (diff)
downloadspark-f376c37268848dbb4b2fb57677e22ef2bf207b49.tar.gz
spark-f376c37268848dbb4b2fb57677e22ef2bf207b49.tar.bz2
spark-f376c37268848dbb4b2fb57677e22ef2bf207b49.zip
[SPARK-16343][SQL] Improve the PushDownPredicate rule to pushdown predicates correctly in non-deterministic condition.
## What changes were proposed in this pull request? Currently our Optimizer may reorder the predicates to run them more efficient, but in non-deterministic condition, change the order between deterministic parts and non-deterministic parts may change the number of input rows. For example: ```SELECT a FROM t WHERE rand() < 0.1 AND a = 1``` And ```SELECT a FROM t WHERE a = 1 AND rand() < 0.1``` may call rand() for different times and therefore the output rows differ. This PR improved this condition by checking whether the predicate is placed before any non-deterministic predicates. ## How was this patch tested? Expanded related testcases in FilterPushdownSuite. Author: 蒋星博 <jiangxingbo@meituan.com> Closes #14012 from jiangxb1987/ppd.
Diffstat (limited to 'python/pyspark/mllib/fpm.py')
0 files changed, 0 insertions, 0 deletions