diff options
author | Sameer Agarwal <sameerag@cs.berkeley.edu> | 2016-08-26 16:40:59 -0700 |
---|---|---|
committer | Yin Huai <yhuai@databricks.com> | 2016-08-26 16:40:59 -0700 |
commit | 540e91280147a61727f99592a66c0cbb12328fac (patch) | |
tree | 47984ed7ef3e1a7023a45415bdf86a06c50be44d /sql/catalyst/src/test | |
parent | f64a1ddd09a34d5d867ccbaba46204d75fad038d (diff) | |
download | spark-540e91280147a61727f99592a66c0cbb12328fac.tar.gz spark-540e91280147a61727f99592a66c0cbb12328fac.tar.bz2 spark-540e91280147a61727f99592a66c0cbb12328fac.zip |
[SPARK-17244] Catalyst should not pushdown non-deterministic join conditions
## What changes were proposed in this pull request?
Given that non-deterministic expressions can be stateful, pushing them down the query plan during the optimization phase can cause incorrect behavior. This patch fixes that issue by explicitly disabling that.
## How was this patch tested?
A new test in `FilterPushdownSuite` that checks catalyst behavior for both deterministic and non-deterministic join conditions.
Author: Sameer Agarwal <sameerag@cs.berkeley.edu>
Closes #14815 from sameeragarwal/constraint-inputfile.
Diffstat (limited to 'sql/catalyst/src/test')
-rw-r--r-- | sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala | 14 |
1 files changed, 14 insertions, 0 deletions
diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala index 9f25e9d8e9..55836f96f7 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala @@ -987,4 +987,18 @@ class FilterPushdownSuite extends PlanTest { comparePlans(Optimize.execute(originalQuery.analyze), correctAnswer) } + + test("join condition pushdown: deterministic and non-deterministic") { + val x = testRelation.subquery('x) + val y = testRelation.subquery('y) + + // Verify that all conditions preceding the first non-deterministic condition are pushed down + // by the optimizer and others are not. + val originalQuery = x.join(y, condition = Some("x.a".attr === 5 && "y.a".attr === 5 && + "x.a".attr === Rand(10) && "y.b".attr === 5)) + val correctAnswer = x.where("x.a".attr === 5).join(y.where("y.a".attr === 5), + condition = Some("x.a".attr === Rand(10) && "y.b".attr === 5)) + + comparePlans(Optimize.execute(originalQuery.analyze), correctAnswer.analyze) + } } |