[SPARK-16181][SQL] outer join with isNull filter may return wrong result

## What changes were proposed in this pull request? The root cause is: the output attributes of outer join are derived from its children, while they are actually different attributes(outer join can return null). We have already added some special logic to handle it, e.g. `PushPredicateThroughJoin` won't push down predicates through outer join side, `FixNullability`. This PR adds one more special logic in `FoldablePropagation`. ## How was this patch tested? new test in `DataFrameSuite` Author: Wenchen Fan <wenchen@databricks.com> Closes #13884 from cloud-fan/bug.
author: Wenchen Fan <wenchen@databricks.com> 2016-06-28 10:26:01 -0700
committer: Yin Huai <yhuai@databricks.com> 2016-06-28 10:26:01 -0700
commit: 1f2776df6e87a84991537ac20e4b8829472d3462 (patch)
tree: 6a1bb15ed398b129c4243bb7447cf707622c90d6
parent: 0923c4f5676691e28e70ecb05890e123540b91f0 (diff)
download: spark-1f2776df6e87a84991537ac20e4b8829472d3462.tar.gz
spark-1f2776df6e87a84991537ac20e4b8829472d3462.tar.bz2
spark-1f2776df6e87a84991537ac20e4b8829472d3462.zip
2 files changed, 17 insertions, 0 deletions
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
index 2bca31d5f1..9bc8cea377 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
@@ -688,6 +688,14 @@ object FoldablePropagation extends Rule[LogicalPlan] {
         case c: Command =>
           stop = true
           c
+        // For outer join, although its output attributes are derived from its children, they are
+        // actually different attributes: the output of outer join is not always picked from its
+        // children, but can also be null.
+        // TODO(cloud-fan): It seems more reasonable to use new attributes as the output attributes
+        // of outer join.
+        case j @ Join(_, _, LeftOuter | RightOuter | FullOuter, _) =>
+          stop = true
+          j
         case p: LogicalPlan if !stop => p.transformExpressions {
           case a: AttributeReference if foldableMap.contains(a) =>
             foldableMap(a)
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
index 6a0a7df3f4..9d53be8e2b 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
@@ -1562,4 +1562,13 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
     val df = Seq(1, 1, 2).toDF("column.with.dot")
     checkAnswer(df.distinct(), Row(1) :: Row(2) :: Nil)
   }
+
+  test("SPARK-16181: outer join with isNull filter") {
+    val left = Seq("x").toDF("col")
+    val right = Seq("y").toDF("col").withColumn("new", lit(true))
+    val joined = left.join(right, left("col") === right("col"), "left_outer")
+
+    checkAnswer(joined, Row("x", null, null))
+    checkAnswer(joined.filter($"new".isNull), Row("x", null, null))
+  }
 }
author	Wenchen Fan <wenchen@databricks.com>	2016-06-28 10:26:01 -0700
committer	Yin Huai <yhuai@databricks.com>	2016-06-28 10:26:01 -0700
commit	1f2776df6e87a84991537ac20e4b8829472d3462 (patch)
tree	6a1bb15ed398b129c4243bb7447cf707622c90d6
parent	0923c4f5676691e28e70ecb05890e123540b91f0 (diff)
download	spark-1f2776df6e87a84991537ac20e4b8829472d3462.tar.gz spark-1f2776df6e87a84991537ac20e4b8829472d3462.tar.bz2 spark-1f2776df6e87a84991537ac20e4b8829472d3462.zip