diff options
author | Herman van Hovell <hvanhovell@databricks.com> | 2017-03-14 18:52:16 +0100 |
---|---|---|
committer | Herman van Hovell <hvanhovell@databricks.com> | 2017-03-14 18:52:16 +0100 |
commit | e04c05cf41a125b0526f59f9b9e7fdf0b78b8b21 (patch) | |
tree | 247a5b094cdf4ba72d801eb4ad5f7565ad757191 /sql/core | |
parent | 6325a2f82a95a63bee020122620bc4f5fd25d059 (diff) | |
download | spark-e04c05cf41a125b0526f59f9b9e7fdf0b78b8b21.tar.gz spark-e04c05cf41a125b0526f59f9b9e7fdf0b78b8b21.tar.bz2 spark-e04c05cf41a125b0526f59f9b9e7fdf0b78b8b21.zip |
[SPARK-19933][SQL] Do not change output of a subquery
## What changes were proposed in this pull request?
The `RemoveRedundantAlias` rule can change the output attributes (the expression id's to be precise) of a query by eliminating the redundant alias producing them. This is no problem for a regular query, but can cause problems for correlated subqueries: The attributes produced by the subquery are used in the parent plan; changing them will break the parent plan.
This PR fixes this by wrapping a subquery in a `Subquery` top level node when it gets optimized. The `RemoveRedundantAlias` rule now recognizes `Subquery` and makes sure that the output attributes of the `Subquery` node are retained.
## How was this patch tested?
Added a test case to `RemoveRedundantAliasAndProjectSuite` and added a regression test to `SubquerySuite`.
Author: Herman van Hovell <hvanhovell@databricks.com>
Closes #17278 from hvanhovell/SPARK-19933.
Diffstat (limited to 'sql/core')
-rw-r--r-- | sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala | 14 |
1 files changed, 14 insertions, 0 deletions
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala index 6f1cd49c08..5fe6667cec 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala @@ -830,4 +830,18 @@ class SubquerySuite extends QueryTest with SharedSQLContext { Row(1) :: Row(0) :: Nil) } } + + test("SPARK-19933 Do not eliminate top-level aliases in sub-queries") { + withTempView("t1", "t2") { + spark.range(4).createOrReplaceTempView("t1") + checkAnswer( + sql("select * from t1 where id in (select id as id from t1)"), + Row(0) :: Row(1) :: Row(2) :: Row(3) :: Nil) + + spark.range(2).createOrReplaceTempView("t2") + checkAnswer( + sql("select * from t1 where id in (select id as id from t2)"), + Row(0) :: Row(1) :: Nil) + } + } } |