[SPARK-7567] [SQL] Migrating Parquet data source to FSBasedRelation

This PR migrates Parquet data source to the newly introduced `FSBasedRelation`. `FSBasedParquetRelation` is created to replace `ParquetRelation2`. Major differences are: 1. Partition discovery code has been factored out to `FSBasedRelation` 1. `AppendingParquetOutputFormat` is not used now. Instead, an anonymous subclass of `ParquetOutputFormat` is used to handle appending and writing dynamic partitions 1. When scanning partitioned tables, `FSBasedParquetRelation.buildScan` only builds an `RDD[Row]` for a single selected partition 1. `FSBasedParquetRelation` doesn't rely on Catalyst expressions for filter push down, thus it doesn't extend `CatalystScan` anymore After migrating `JSONRelation` (which extends `CatalystScan`), we can remove `CatalystScan`.  [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/6090)  Author: Cheng Lian <lian@databricks.com> Closes #6090 from liancheng/parquet-migration and squashes the following commits: 6063f87 [Cheng Lian] Casts to OutputCommitter rather than FileOutputCommtter bfd1cf0 [Cheng Lian] Fixes compilation error introduced while rebasing f9ea56e [Cheng Lian] Adds ParquetRelation2 related classes to MiMa check whitelist 261d8c1 [Cheng Lian] Minor bug fix and more tests db65660 [Cheng Lian] Migrates Parquet data source to FSBasedRelation
author: Cheng Lian <lian@databricks.com> 2015-05-13 11:04:10 -0700
committer: Michael Armbrust <michael@databricks.com> 2015-05-13 11:04:10 -0700
commit: 7ff16e8abef9fbf4a4855e23c256b22e62e560a6 (patch)
tree: 1be1249ecb9db02ef5bf8820f7c44a7fbe71a6ff /project
parent: bec938f777a2e18757c7d04504d86a5342e2b49e (diff)
download: spark-7ff16e8abef9fbf4a4855e23c256b22e62e560a6.tar.gz
spark-7ff16e8abef9fbf4a4855e23c256b22e62e560a6.tar.bz2
spark-7ff16e8abef9fbf4a4855e23c256b22e62e560a6.zip
1 files changed, 6 insertions, 0 deletions
diff --git a/project/MimaExcludes.scala b/project/MimaExcludes.scala
index a47e29e2ef..f31f0e554e 100644
--- a/project/MimaExcludes.scala
+++ b/project/MimaExcludes.scala
@@ -111,6 +111,12 @@ object MimaExcludes {
               "org.apache.spark.sql.parquet.ParquetRelation2$PartitionValues"),
             ProblemFilters.exclude[MissingClassProblem](
               "org.apache.spark.sql.parquet.ParquetRelation2$PartitionValues$"),
+            ProblemFilters.exclude[MissingClassProblem](
+              "org.apache.spark.sql.parquet.ParquetRelation2"),
+            ProblemFilters.exclude[MissingClassProblem](
+              "org.apache.spark.sql.parquet.ParquetRelation2$"),
+            ProblemFilters.exclude[MissingClassProblem](
+              "org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache"),
             // These test support classes were moved out of src/main and into src/test:
             ProblemFilters.exclude[MissingClassProblem](
               "org.apache.spark.sql.parquet.ParquetTestData"),
author	Cheng Lian <lian@databricks.com>	2015-05-13 11:04:10 -0700
committer	Michael Armbrust <michael@databricks.com>	2015-05-13 11:04:10 -0700
commit	7ff16e8abef9fbf4a4855e23c256b22e62e560a6 (patch)
tree	1be1249ecb9db02ef5bf8820f7c44a7fbe71a6ff /project
parent	bec938f777a2e18757c7d04504d86a5342e2b49e (diff)
download	spark-7ff16e8abef9fbf4a4855e23c256b22e62e560a6.tar.gz spark-7ff16e8abef9fbf4a4855e23c256b22e62e560a6.tar.bz2 spark-7ff16e8abef9fbf4a4855e23c256b22e62e560a6.zip