[SPARK-4937][SQL] Normalizes conjunctions and disjunctions to eliminate common predicates

This PR is a simplified version of several filter optimization rules introduced in #3778 authored by scwf. Newly introduced optimizations include: 1. `a && a` => `a` 2. `a || a` => `a` 3. `(a || b || c || ...) && (a || b || d || ...)` => `a && b && (c || d || ...)` The 3rd rule is particularly useful for optimizing the following query, which is planned into a cartesian product ```sql SELECT * FROM t1, t2 WHERE (t1.key = t2.key AND t1.value > 10) OR (t1.key = t2.key AND t2.value < 20) ``` to the following one, which is planned into an equi-join: ```sql SELECT * FROM t1, t2 WHERE t1.key = t2.key AND (t1.value > 10 OR t2.value < 20) ``` The example above is quite artificial, but common predicates are likely to appear in real life complex queries (like the one mentioned in #3778). A difference between this PR and #3778 is that these optimizations are not limited to `Filter`, but are generalized to all logical plan nodes. Thanks to scwf for bringing up these optimizations, and chenghao-intel for the generalization suggestion.  [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3784)  Author: Cheng Lian <lian@databricks.com> Closes #3784 from liancheng/normalize-filters and squashes the following commits: caca560 [Cheng Lian] Moves filter normalization into BooleanSimplification rule 4ab3a58 [Cheng Lian] Fixes test failure, adds more tests 5d54349 [Cheng Lian] Fixes typo in comment 2abbf8e [Cheng Lian] Forgot our sacred Apache licence header... cf95639 [Cheng Lian] Adds an optimization rule for filter normalization
author: Cheng Lian <lian@databricks.com> 2014-12-30 13:38:27 -0800
committer: Michael Armbrust <michael@databricks.com> 2014-12-30 13:38:27 -0800
commit: 61a99f6a11d85e931e7d60f9ab4370b3b40a52ef (patch)
tree: d80690362a7d4b65e85f40da279ee88d092d3094 /sql/core
parent: a75dd83b72586695768c89ed32b240aa8f48f32c (diff)
download: spark-61a99f6a11d85e931e7d60f9ab4370b3b40a52ef.tar.gz
spark-61a99f6a11d85e931e7d60f9ab4370b3b40a52ef.tar.bz2
spark-61a99f6a11d85e931e7d60f9ab4370b3b40a52ef.zip
1 files changed, 7 insertions, 3 deletions
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/columnar/PartitionBatchPruningSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/columnar/PartitionBatchPruningSuite.scala
index 82afa31a99..1915c25392 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/columnar/PartitionBatchPruningSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/columnar/PartitionBatchPruningSuite.scala
@@ -105,7 +105,9 @@ class PartitionBatchPruningSuite extends FunSuite with BeforeAndAfterAll with Be
 
     test(query) {
       val schemaRdd = sql(query)
-      assertResult(expectedQueryResult.toArray, "Wrong query result") {
+      val queryExecution = schemaRdd.queryExecution
+
+      assertResult(expectedQueryResult.toArray, s"Wrong query result: $queryExecution") {
         schemaRdd.collect().map(_.head).toArray
       }
 
@@ -113,8 +115,10 @@ class PartitionBatchPruningSuite extends FunSuite with BeforeAndAfterAll with Be
         case in: InMemoryColumnarTableScan => (in.readPartitions.value, in.readBatches.value)
       }.head
 
-      assert(readBatches === expectedReadBatches, "Wrong number of read batches")
-      assert(readPartitions === expectedReadPartitions, "Wrong number of read partitions")
+      assert(readBatches === expectedReadBatches, s"Wrong number of read batches: $queryExecution")
+      assert(
+        readPartitions === expectedReadPartitions,
+        s"Wrong number of read partitions: $queryExecution")
     }
   }
 }
author	Cheng Lian <lian@databricks.com>	2014-12-30 13:38:27 -0800
committer	Michael Armbrust <michael@databricks.com>	2014-12-30 13:38:27 -0800
commit	61a99f6a11d85e931e7d60f9ab4370b3b40a52ef (patch)
tree	d80690362a7d4b65e85f40da279ee88d092d3094 /sql/core
parent	a75dd83b72586695768c89ed32b240aa8f48f32c (diff)
download	spark-61a99f6a11d85e931e7d60f9ab4370b3b40a52ef.tar.gz spark-61a99f6a11d85e931e7d60f9ab4370b3b40a52ef.tar.bz2 spark-61a99f6a11d85e931e7d60f9ab4370b3b40a52ef.zip