[SPARK-18614][SQL] Incorrect predicate pushdown from ExistenceJoin

## What changes were proposed in this pull request? ExistenceJoin should be treated the same as LeftOuter and LeftAnti, not InnerLike and LeftSemi. This is not currently exposed because the rewrite of [NOT] EXISTS OR ... to ExistenceJoin happens in rule RewritePredicateSubquery, which is in a separate rule set and placed after the rule PushPredicateThroughJoin. During the transformation in the rule PushPredicateThroughJoin, an ExistenceJoin never exists. The semantics of ExistenceJoin says we need to preserve all the rows from the left table through the join operation as if it is a regular LeftOuter join. The ExistenceJoin augments the LeftOuter operation with a new column called exists, set to true when the join condition in the ON clause is true and false otherwise. The filter of any rows will happen in the Filter operation above the ExistenceJoin. Example: A(c1, c2): { (1, 1), (1, 2) } // B can be any value as it is irrelevant in this example B(c1): { (NULL) } select A.* from A where exists (select 1 from B where A.c1 = A.c2) or A.c2=2 In this example, the correct result is all the rows from A. If the pattern ExistenceJoin around line 935 in Optimizer.scala is indeed active, the code will push down the predicate A.c1 = A.c2 to be a Filter on relation A, which will incorrectly filter the row (1,2) from A. ## How was this patch tested? Since this is not an exposed case, no new test cases is added. The scenario is discovered via a code review of another PR and confirmed to be valid with peer. Author: Nattavut Sutyanyong <nsy.can@gmail.com> Closes #16044 from nsyca/spark-18614.
author: Nattavut Sutyanyong <nsy.can@gmail.com> 2016-11-29 15:27:43 -0800
committer: Herman van Hovell <hvanhovell@databricks.com> 2016-11-29 15:27:43 -0800
commit: 3600635215f25d695c9be5931b5185fec8a35527 (patch)
tree: 6cd7ff3ea42c27dbd1938af75f6de5b10160e0ab /sql/core/src/test/resources/sql-tests/results/pred-pushdown.sql.out
parent: f8878a4c6f7c4ebb16e4aef26ad0869ba12eb9fc (diff)
download: spark-3600635215f25d695c9be5931b5185fec8a35527.tar.gz
spark-3600635215f25d695c9be5931b5185fec8a35527.tar.bz2
spark-3600635215f25d695c9be5931b5185fec8a35527.zip
1 files changed, 40 insertions, 0 deletions
diff --git a/sql/core/src/test/resources/sql-tests/results/pred-pushdown.sql.out b/sql/core/src/test/resources/sql-tests/results/pred-pushdown.sql.out
new file mode 100644
index 0000000000..1b8ddbe4c7
--- /dev/null
+++ b/sql/core/src/test/resources/sql-tests/results/pred-pushdown.sql.out
@@ -0,0 +1,40 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 4
+
+
+-- !query 0
+CREATE OR REPLACE TEMPORARY VIEW tbl_a AS VALUES (1, 1), (2, 1), (3, 6) AS T(c1, c2)
+-- !query 0 schema
+struct<>
+-- !query 0 output
+
+
+
+-- !query 1
+CREATE OR REPLACE TEMPORARY VIEW tbl_b AS VALUES 1 AS T(c1)
+-- !query 1 schema
+struct<>
+-- !query 1 output
+
+
+
+-- !query 2
+SELECT *
+FROM   tbl_a
+       LEFT ANTI JOIN tbl_b ON ((tbl_a.c1 = tbl_a.c2) IS NULL OR tbl_a.c1 = tbl_a.c2)
+-- !query 2 schema
+struct<c1:int,c2:int>
+-- !query 2 output
+2	1
+3	6
+
+
+-- !query 3
+SELECT l.c1, l.c2
+FROM   tbl_a l
+WHERE  EXISTS (SELECT 1 FROM tbl_b r WHERE l.c1 = l.c2) OR l.c2 < 2
+-- !query 3 schema
+struct<c1:int,c2:int>
+-- !query 3 output
+1	1
+2	1
author	Nattavut Sutyanyong <nsy.can@gmail.com>	2016-11-29 15:27:43 -0800
committer	Herman van Hovell <hvanhovell@databricks.com>	2016-11-29 15:27:43 -0800
commit	3600635215f25d695c9be5931b5185fec8a35527 (patch)
tree	6cd7ff3ea42c27dbd1938af75f6de5b10160e0ab /sql/core/src/test/resources/sql-tests/results/pred-pushdown.sql.out
parent	f8878a4c6f7c4ebb16e4aef26ad0869ba12eb9fc (diff)
download	spark-3600635215f25d695c9be5931b5185fec8a35527.tar.gz spark-3600635215f25d695c9be5931b5185fec8a35527.tar.bz2 spark-3600635215f25d695c9be5931b5185fec8a35527.zip