Unify the logic for column pruning, projection, and filtering of table scans.

This removes duplicated logic, dead code and casting when planning parquet table scans and hive table scans. Other changes: - Fix tests now that we are doing a better job of column pruning (i.e., since pruning predicates are applied before we even start scanning tuples, columns required by these predicates do not need to be included in the output of the scan unless they are also included in the final output of this logical plan fragment). - Add rule to simplify trivial filters. This was required to avoid `WHERE false` from getting pushed into table scans, since `HiveTableScan` (reasonably) refuses to apply partition pruning predicates to non-partitioned tables. Author: Michael Armbrust <michael@databricks.com> Closes #213 from marmbrus/strategyCleanup and squashes the following commits: 48ce403 [Michael Armbrust] Move one more bit of parquet stuff into the core SQLContext. 834ce08 [Michael Armbrust] Address comments. 0f2c6f5 [Michael Armbrust] Unify the logic for column pruning, projection, and filtering of table scans for both Hive and Parquet relations. Fix tests now that we are doing a better job of column pruning.
author: Michael Armbrust <michael@databricks.com> 2014-03-24 22:15:51 -0700
committer: Patrick Wendell <pwendell@gmail.com> 2014-03-24 22:15:51 -0700
commit: b637f2d91ab4d3d5bf13e8d959c919ebd776f6af (patch)
tree: 8c6555150402e804f00eca24e7c71eebc3426a23 /sql/catalyst
parent: 5140598df889f7227c9d6a7953031eeef524badd (diff)
download: spark-b637f2d91ab4d3d5bf13e8d959c919ebd776f6af.tar.gz
spark-b637f2d91ab4d3d5bf13e8d959c919ebd776f6af.tar.bz2
spark-b637f2d91ab4d3d5bf13e8d959c919ebd776f6af.zip
1 files changed, 17 insertions, 0 deletions
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
index 07ebbc90fc..f28076999d 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
@@ -30,6 +30,7 @@ object Optimizer extends RuleExecutor[LogicalPlan] {
     Batch("ConstantFolding", Once,
       ConstantFolding,
       BooleanSimplification,
+      SimplifyFilters,
       SimplifyCasts) ::
     Batch("Filter Pushdown", Once,
       CombineFilters,
@@ -91,6 +92,22 @@ object CombineFilters extends Rule[LogicalPlan] {
 }
 
 /**
+ * Removes filters that can be evaluated trivially.  This is done either by eliding the filter for
+ * cases where it will always evaluate to `true`, or substituting a dummy empty relation when the
+ * filter will always evaluate to `false`.
+ */
+object SimplifyFilters extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+    case Filter(Literal(true, BooleanType), child) =>
+      child
+    case Filter(Literal(null, _), child) =>
+      LocalRelation(child.output)
+    case Filter(Literal(false, BooleanType), child) =>
+      LocalRelation(child.output)
+  }
+}
+
+/**
  * Pushes [[catalyst.plans.logical.Filter Filter]] operators through
  * [[catalyst.plans.logical.Project Project]] operators, in-lining any
  * [[catalyst.expressions.Alias Aliases]] that were defined in the projection.
author	Michael Armbrust <michael@databricks.com>	2014-03-24 22:15:51 -0700
committer	Patrick Wendell <pwendell@gmail.com>	2014-03-24 22:15:51 -0700
commit	b637f2d91ab4d3d5bf13e8d959c919ebd776f6af (patch)
tree	8c6555150402e804f00eca24e7c71eebc3426a23 /sql/catalyst
parent	5140598df889f7227c9d6a7953031eeef524badd (diff)
download	spark-b637f2d91ab4d3d5bf13e8d959c919ebd776f6af.tar.gz spark-b637f2d91ab4d3d5bf13e8d959c919ebd776f6af.tar.bz2 spark-b637f2d91ab4d3d5bf13e8d959c919ebd776f6af.zip