diff options
author | Stan Zhai <zhaishidan@haizhi.com> | 2017-03-01 07:52:35 -0800 |
---|---|---|
committer | Xiao Li <gatorsmile@gmail.com> | 2017-03-01 07:52:35 -0800 |
commit | 5502a9cf883b2058209904c152e5d2c2a106b072 (patch) | |
tree | d23f88fb04419a6c08e41f9b3531b62f0f9b3a0c /sql/core/src/test/resources/sql-tests/results/arithmetic.sql.out | |
parent | 38e7835347a2e1803b1df5e73cf8b749951b11b2 (diff) | |
download | spark-5502a9cf883b2058209904c152e5d2c2a106b072.tar.gz spark-5502a9cf883b2058209904c152e5d2c2a106b072.tar.bz2 spark-5502a9cf883b2058209904c152e5d2c2a106b072.zip |
[SPARK-19766][SQL] Constant alias columns in INNER JOIN should not be folded by FoldablePropagation rule
## What changes were proposed in this pull request?
This PR fixes the code in Optimizer phase where the constant alias columns of a `INNER JOIN` query are folded in Rule `FoldablePropagation`.
For the following query():
```
val sqlA =
"""
|create temporary view ta as
|select a, 'a' as tag from t1 union all
|select a, 'b' as tag from t2
""".stripMargin
val sqlB =
"""
|create temporary view tb as
|select a, 'a' as tag from t3 union all
|select a, 'b' as tag from t4
""".stripMargin
val sql =
"""
|select tb.* from ta inner join tb on
|ta.a = tb.a and
|ta.tag = tb.tag
""".stripMargin
```
The tag column is an constant alias column, it's folded by `FoldablePropagation` like this:
```
TRACE SparkOptimizer:
=== Applying Rule org.apache.spark.sql.catalyst.optimizer.FoldablePropagation ===
Project [a#4, tag#14] Project [a#4, tag#14]
!+- Join Inner, ((a#0 = a#4) && (tag#8 = tag#14)) +- Join Inner, ((a#0 = a#4) && (a = a))
:- Union :- Union
: :- Project [a#0, a AS tag#8] : :- Project [a#0, a AS tag#8]
: : +- LocalRelation [a#0] : : +- LocalRelation [a#0]
: +- Project [a#2, b AS tag#9] : +- Project [a#2, b AS tag#9]
: +- LocalRelation [a#2] : +- LocalRelation [a#2]
+- Union +- Union
:- Project [a#4, a AS tag#14] :- Project [a#4, a AS tag#14]
: +- LocalRelation [a#4] : +- LocalRelation [a#4]
+- Project [a#6, b AS tag#15] +- Project [a#6, b AS tag#15]
+- LocalRelation [a#6] +- LocalRelation [a#6]
```
Finally the Result of Batch Operator Optimizations is:
```
Project [a#4, tag#14] Project [a#4, tag#14]
!+- Join Inner, ((a#0 = a#4) && (tag#8 = tag#14)) +- Join Inner, (a#0 = a#4)
! :- SubqueryAlias ta, `ta` :- Union
! : +- Union : :- LocalRelation [a#0]
! : :- Project [a#0, a AS tag#8] : +- LocalRelation [a#2]
! : : +- SubqueryAlias t1, `t1` +- Union
! : : +- Project [a#0] :- LocalRelation [a#4, tag#14]
! : : +- SubqueryAlias grouping +- LocalRelation [a#6, tag#15]
! : : +- LocalRelation [a#0]
! : +- Project [a#2, b AS tag#9]
! : +- SubqueryAlias t2, `t2`
! : +- Project [a#2]
! : +- SubqueryAlias grouping
! : +- LocalRelation [a#2]
! +- SubqueryAlias tb, `tb`
! +- Union
! :- Project [a#4, a AS tag#14]
! : +- SubqueryAlias t3, `t3`
! : +- Project [a#4]
! : +- SubqueryAlias grouping
! : +- LocalRelation [a#4]
! +- Project [a#6, b AS tag#15]
! +- SubqueryAlias t4, `t4`
! +- Project [a#6]
! +- SubqueryAlias grouping
! +- LocalRelation [a#6]
```
The condition `tag#8 = tag#14` of INNER JOIN has been removed. This leads to the data of inner join being wrong.
After fix:
```
=== Result of Batch LocalRelation ===
GlobalLimit 21 GlobalLimit 21
+- LocalLimit 21 +- LocalLimit 21
+- Project [a#4, tag#11] +- Project [a#4, tag#11]
+- Join Inner, ((a#0 = a#4) && (tag#8 = tag#11)) +- Join Inner, ((a#0 = a#4) && (tag#8 = tag#11))
! :- SubqueryAlias ta :- Union
! : +- Union : :- LocalRelation [a#0, tag#8]
! : :- Project [a#0, a AS tag#8] : +- LocalRelation [a#2, tag#9]
! : : +- SubqueryAlias t1 +- Union
! : : +- Project [a#0] :- LocalRelation [a#4, tag#11]
! : : +- SubqueryAlias grouping +- LocalRelation [a#6, tag#12]
! : : +- LocalRelation [a#0]
! : +- Project [a#2, b AS tag#9]
! : +- SubqueryAlias t2
! : +- Project [a#2]
! : +- SubqueryAlias grouping
! : +- LocalRelation [a#2]
! +- SubqueryAlias tb
! +- Union
! :- Project [a#4, a AS tag#11]
! : +- SubqueryAlias t3
! : +- Project [a#4]
! : +- SubqueryAlias grouping
! : +- LocalRelation [a#4]
! +- Project [a#6, b AS tag#12]
! +- SubqueryAlias t4
! +- Project [a#6]
! +- SubqueryAlias grouping
! +- LocalRelation [a#6]
```
## How was this patch tested?
add sql-tests/inputs/inner-join.sql
All tests passed.
Author: Stan Zhai <zhaishidan@haizhi.com>
Closes #17099 from stanzhai/fix-inner-join.
Diffstat (limited to 'sql/core/src/test/resources/sql-tests/results/arithmetic.sql.out')
0 files changed, 0 insertions, 0 deletions