aboutsummaryrefslogtreecommitdiff
path: root/sql/core
diff options
context:
space:
mode:
authorCheng Lian <lian@databricks.com>2014-12-30 13:38:27 -0800
committerMichael Armbrust <michael@databricks.com>2014-12-30 13:38:27 -0800
commit61a99f6a11d85e931e7d60f9ab4370b3b40a52ef (patch)
treed80690362a7d4b65e85f40da279ee88d092d3094 /sql/core
parenta75dd83b72586695768c89ed32b240aa8f48f32c (diff)
downloadspark-61a99f6a11d85e931e7d60f9ab4370b3b40a52ef.tar.gz
spark-61a99f6a11d85e931e7d60f9ab4370b3b40a52ef.tar.bz2
spark-61a99f6a11d85e931e7d60f9ab4370b3b40a52ef.zip
[SPARK-4937][SQL] Normalizes conjunctions and disjunctions to eliminate common predicates
This PR is a simplified version of several filter optimization rules introduced in #3778 authored by scwf. Newly introduced optimizations include: 1. `a && a` => `a` 2. `a || a` => `a` 3. `(a || b || c || ...) && (a || b || d || ...)` => `a && b && (c || d || ...)` The 3rd rule is particularly useful for optimizing the following query, which is planned into a cartesian product ```sql SELECT * FROM t1, t2 WHERE (t1.key = t2.key AND t1.value > 10) OR (t1.key = t2.key AND t2.value < 20) ``` to the following one, which is planned into an equi-join: ```sql SELECT * FROM t1, t2 WHERE t1.key = t2.key AND (t1.value > 10 OR t2.value < 20) ``` The example above is quite artificial, but common predicates are likely to appear in real life complex queries (like the one mentioned in #3778). A difference between this PR and #3778 is that these optimizations are not limited to `Filter`, but are generalized to all logical plan nodes. Thanks to scwf for bringing up these optimizations, and chenghao-intel for the generalization suggestion. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3784) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #3784 from liancheng/normalize-filters and squashes the following commits: caca560 [Cheng Lian] Moves filter normalization into BooleanSimplification rule 4ab3a58 [Cheng Lian] Fixes test failure, adds more tests 5d54349 [Cheng Lian] Fixes typo in comment 2abbf8e [Cheng Lian] Forgot our sacred Apache licence header... cf95639 [Cheng Lian] Adds an optimization rule for filter normalization
Diffstat (limited to 'sql/core')
-rw-r--r--sql/core/src/test/scala/org/apache/spark/sql/columnar/PartitionBatchPruningSuite.scala10
1 files changed, 7 insertions, 3 deletions
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/columnar/PartitionBatchPruningSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/columnar/PartitionBatchPruningSuite.scala
index 82afa31a99..1915c25392 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/columnar/PartitionBatchPruningSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/columnar/PartitionBatchPruningSuite.scala
@@ -105,7 +105,9 @@ class PartitionBatchPruningSuite extends FunSuite with BeforeAndAfterAll with Be
test(query) {
val schemaRdd = sql(query)
- assertResult(expectedQueryResult.toArray, "Wrong query result") {
+ val queryExecution = schemaRdd.queryExecution
+
+ assertResult(expectedQueryResult.toArray, s"Wrong query result: $queryExecution") {
schemaRdd.collect().map(_.head).toArray
}
@@ -113,8 +115,10 @@ class PartitionBatchPruningSuite extends FunSuite with BeforeAndAfterAll with Be
case in: InMemoryColumnarTableScan => (in.readPartitions.value, in.readBatches.value)
}.head
- assert(readBatches === expectedReadBatches, "Wrong number of read batches")
- assert(readPartitions === expectedReadPartitions, "Wrong number of read partitions")
+ assert(readBatches === expectedReadBatches, s"Wrong number of read batches: $queryExecution")
+ assert(
+ readPartitions === expectedReadPartitions,
+ s"Wrong number of read partitions: $queryExecution")
}
}
}