aboutsummaryrefslogtreecommitdiff
path: root/sql/core/src/test/resources/sql-tests/results/outer-join.sql.out
diff options
context:
space:
mode:
authorgatorsmile <gatorsmile@gmail.com>2016-08-25 14:18:58 +0200
committerHerman van Hovell <hvanhovell@databricks.com>2016-08-25 14:18:58 +0200
commitd2ae6399ee2f0524b88262735adbbcb2035de8fd (patch)
tree2873fa10e507dc123d288e26c6c520932e4be765 /sql/core/src/test/resources/sql-tests/results/outer-join.sql.out
parent2b0cc4e0dfa4ffb9f21ff4a303015bc9c962d42b (diff)
downloadspark-d2ae6399ee2f0524b88262735adbbcb2035de8fd.tar.gz
spark-d2ae6399ee2f0524b88262735adbbcb2035de8fd.tar.bz2
spark-d2ae6399ee2f0524b88262735adbbcb2035de8fd.zip
[SPARK-16991][SPARK-17099][SPARK-17120][SQL] Fix Outer Join Elimination when Filter's isNotNull Constraints Unable to Filter Out All Null-supplying Rows
### What changes were proposed in this pull request? This PR is to fix an incorrect outer join elimination when filter's `isNotNull` constraints is unable to filter out all null-supplying rows. For example, `isnotnull(coalesce(b#227, c#238))`. Users can hit this error when they try to use `using/natural outer join`, which is converted to a normal outer join with a `coalesce` expression on the `using columns`. For example, ```Scala val a = Seq((1, 2), (2, 3)).toDF("a", "b") val b = Seq((2, 5), (3, 4)).toDF("a", "c") val c = Seq((3, 1)).toDF("a", "d") val ab = a.join(b, Seq("a"), "fullouter") ab.join(c, "a").explain(true) ``` The dataframe `ab` is doing `using full-outer join`, which is converted to a normal outer join with a `coalesce` expression. Constraints inference generates a `Filter` with constraints `isnotnull(coalesce(b#227, c#238))`. Then, it triggers a wrong outer join elimination and generates a wrong result. ``` Project [a#251, b#227, c#237, d#247] +- Join Inner, (a#251 = a#246) :- Project [coalesce(a#226, a#236) AS a#251, b#227, c#237] : +- Join FullOuter, (a#226 = a#236) : :- Project [_1#223 AS a#226, _2#224 AS b#227] : : +- LocalRelation [_1#223, _2#224] : +- Project [_1#233 AS a#236, _2#234 AS c#237] : +- LocalRelation [_1#233, _2#234] +- Project [_1#243 AS a#246, _2#244 AS d#247] +- LocalRelation [_1#243, _2#244] == Optimized Logical Plan == Project [a#251, b#227, c#237, d#247] +- Join Inner, (a#251 = a#246) :- Project [coalesce(a#226, a#236) AS a#251, b#227, c#237] : +- Filter isnotnull(coalesce(a#226, a#236)) : +- Join FullOuter, (a#226 = a#236) : :- LocalRelation [a#226, b#227] : +- LocalRelation [a#236, c#237] +- LocalRelation [a#246, d#247] ``` **A note to the `Committer`**, please also give the credit to dongjoon-hyun who submitted another PR for fixing this issue. https://github.com/apache/spark/pull/14580 ### How was this patch tested? Added test cases Author: gatorsmile <gatorsmile@gmail.com> Closes #14661 from gatorsmile/fixOuterJoinElimination.
Diffstat (limited to 'sql/core/src/test/resources/sql-tests/results/outer-join.sql.out')
-rw-r--r--sql/core/src/test/resources/sql-tests/results/outer-join.sql.out72
1 files changed, 72 insertions, 0 deletions
diff --git a/sql/core/src/test/resources/sql-tests/results/outer-join.sql.out b/sql/core/src/test/resources/sql-tests/results/outer-join.sql.out
new file mode 100644
index 0000000000..b39fdb0e58
--- /dev/null
+++ b/sql/core/src/test/resources/sql-tests/results/outer-join.sql.out
@@ -0,0 +1,72 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 6
+
+
+-- !query 0
+CREATE OR REPLACE TEMPORARY VIEW t1 AS SELECT * FROM VALUES
+(-234), (145), (367), (975), (298)
+as t1(int_col1)
+-- !query 0 schema
+struct<>
+-- !query 0 output
+
+
+
+-- !query 1
+CREATE OR REPLACE TEMPORARY VIEW t2 AS SELECT * FROM VALUES
+(-769, -244), (-800, -409), (940, 86), (-507, 304), (-367, 158)
+as t2(int_col0, int_col1)
+-- !query 1 schema
+struct<>
+-- !query 1 output
+
+
+
+-- !query 2
+SELECT
+ (SUM(COALESCE(t1.int_col1, t2.int_col0))),
+ ((COALESCE(t1.int_col1, t2.int_col0)) * 2)
+FROM t1
+RIGHT JOIN t2
+ ON (t2.int_col0) = (t1.int_col1)
+GROUP BY GREATEST(COALESCE(t2.int_col1, 109), COALESCE(t1.int_col1, -449)),
+ COALESCE(t1.int_col1, t2.int_col0)
+HAVING (SUM(COALESCE(t1.int_col1, t2.int_col0)))
+ > ((COALESCE(t1.int_col1, t2.int_col0)) * 2)
+-- !query 2 schema
+struct<sum(coalesce(int_col1, int_col0)):bigint,(coalesce(int_col1, int_col0) * 2):int>
+-- !query 2 output
+-367 -734
+-507 -1014
+-769 -1538
+-800 -1600
+
+
+-- !query 3
+CREATE OR REPLACE TEMPORARY VIEW t1 AS SELECT * FROM VALUES (97) as t1(int_col1)
+-- !query 3 schema
+struct<>
+-- !query 3 output
+
+
+
+-- !query 4
+CREATE OR REPLACE TEMPORARY VIEW t2 AS SELECT * FROM VALUES (0) as t2(int_col1)
+-- !query 4 schema
+struct<>
+-- !query 4 output
+
+
+
+-- !query 5
+SELECT *
+FROM (
+SELECT
+ COALESCE(t2.int_col1, t1.int_col1) AS int_col
+ FROM t1
+ LEFT JOIN t2 ON false
+) t where (t.int_col) is not null
+-- !query 5 schema
+struct<int_col:int>
+-- !query 5 output
+97