[SPARK-4453][SPARK-4213][SQL] Simplifies Parquet filter generation code

While reviewing PR #3083 and #3161, I noticed that Parquet record filter generation code can be simplified significantly according to the clue stated in [SPARK-4453](https://issues.apache.org/jira/browse/SPARK-4213). This PR addresses both SPARK-4453 and SPARK-4213 with this simplification. While generating `ParquetTableScan` operator, we need to remove all Catalyst predicates that have already been pushed down to Parquet. Originally, we first generate the record filter, and then call `findExpression` to traverse the generated filter to find out all pushed down predicates [[1](https://github.com/apache/spark/blob/64c6b9bad559c21f25cd9fbe37c8813cdab939f2/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L213-L228)]. In this way, we have to introduce the `CatalystFilter` class hierarchy to bind the Catalyst predicates together with their generated Parquet filter, and complicate the code base a lot. The basic idea of this PR is that, we don't need `findExpression` after filter generation, because we already know a predicate can be pushed down if we can successfully generate its corresponding Parquet filter. SPARK-4213 is fixed by returning `None` for any unsupported predicate type.  [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3317)  Author: Cheng Lian <lian@databricks.com> Closes #3317 from liancheng/simplify-parquet-filters and squashes the following commits: d6a9499 [Cheng Lian] Fixes import styling issue 43760e8 [Cheng Lian] Simplifies Parquet filter generation logic
author: Cheng Lian <lian@databricks.com> 2014-11-17 16:55:12 -0800
committer: Michael Armbrust <michael@databricks.com> 2014-11-17 16:55:12 -0800
commit: 36b0956a3eadc7343ed0d25c79a6ce0496eaaccd (patch)
tree: 47fba8e9a00b21b20b77342a9a45f9d0f9969489 /sql/catalyst
parent: ef7c464effa1510b24bd8e665e4df6c4839b0c87 (diff)
download: spark-36b0956a3eadc7343ed0d25c79a6ce0496eaaccd.tar.gz
spark-36b0956a3eadc7343ed0d25c79a6ce0496eaaccd.tar.bz2
spark-36b0956a3eadc7343ed0d25c79a6ce0496eaaccd.zip
1 files changed, 1 insertions, 0 deletions
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
index fc90a54a58..7634d392d4 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
@@ -26,6 +26,7 @@ import org.apache.spark.sql.catalyst.util.Metadata
 object NamedExpression {
   private val curId = new java.util.concurrent.atomic.AtomicLong()
   def newExprId = ExprId(curId.getAndIncrement())
+  def unapply(expr: NamedExpression): Option[(String, DataType)] = Some(expr.name, expr.dataType)
 }
 
 /**
author	Cheng Lian <lian@databricks.com>	2014-11-17 16:55:12 -0800
committer	Michael Armbrust <michael@databricks.com>	2014-11-17 16:55:12 -0800
commit	36b0956a3eadc7343ed0d25c79a6ce0496eaaccd (patch)
tree	47fba8e9a00b21b20b77342a9a45f9d0f9969489 /sql/catalyst
parent	ef7c464effa1510b24bd8e665e4df6c4839b0c87 (diff)
download	spark-36b0956a3eadc7343ed0d25c79a6ce0496eaaccd.tar.gz spark-36b0956a3eadc7343ed0d25c79a6ce0496eaaccd.tar.bz2 spark-36b0956a3eadc7343ed0d25c79a6ce0496eaaccd.zip