[SPARK-17354] [SQL] Partitioning by dates/timestamps should work with Parquet vectorized reader

## What changes were proposed in this pull request? This PR fixes `ColumnVectorUtils.populate` so that Parquet vectorized reader can read partitioned table with dates/timestamps. This works fine with Parquet normal reader. This is being only called within [VectorizedParquetRecordReader.java#L185](https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedParquetRecordReader.java#L185). When partition column types are explicitly given to `DateType` or `TimestampType` (rather than inferring the type of partition column), this fails with the exception below: ``` 16/09/01 10:30:07 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 6) java.lang.ClassCastException: java.lang.Integer cannot be cast to java.sql.Date at org.apache.spark.sql.execution.vectorized.ColumnVectorUtils.populate(ColumnVectorUtils.java:89) at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initBatch(VectorizedParquetRecordReader.java:185) at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initBatch(VectorizedParquetRecordReader.java:204) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReader$1.apply(ParquetFileFormat.scala:362) ... ``` ## How was this patch tested? Unit tests in `SQLQuerySuite`. Author: hyukjinkwon <gurwls223@gmail.com> Closes #14919 from HyukjinKwon/SPARK-17354.
author: hyukjinkwon <gurwls223@gmail.com> 2016-09-09 14:23:05 -0700
committer: Davies Liu <davies.liu@gmail.com> 2016-09-09 14:23:05 -0700
commit: f7d2143705c8c1baeed0bc62940f9dba636e705b (patch)
tree: 8067836599fbfb1a71595fb01551e0f775c6b644 /sql/hive
parent: a3981c28c956a82ccf5b1c61d45b6bd252d4abed (diff)
download: spark-f7d2143705c8c1baeed0bc62940f9dba636e705b.tar.gz
spark-f7d2143705c8c1baeed0bc62940f9dba636e705b.tar.bz2
spark-f7d2143705c8c1baeed0bc62940f9dba636e705b.zip
1 files changed, 21 insertions, 0 deletions
diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
index 05d0687fb7..dc4d099f0f 100644
--- a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
+++ b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
@@ -1787,6 +1787,27 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
     }
   }
 
+  test("SPARK-17354: Partitioning by dates/timestamps works with Parquet vectorized reader") {
+    withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "true") {
+      sql(
+        """CREATE TABLE order(id INT)
+          |PARTITIONED BY (pd DATE, pt TIMESTAMP)
+          |STORED AS PARQUET
+        """.stripMargin)
+
+      sql("set hive.exec.dynamic.partition.mode=nonstrict")
+      sql(
+        """INSERT INTO TABLE order PARTITION(pd, pt)
+          |SELECT 1 AS id, CAST('1990-02-24' AS DATE) AS pd, CAST('1990-02-24' AS TIMESTAMP) AS pt
+        """.stripMargin)
+      val actual = sql("SELECT * FROM order")
+      val expected = sql(
+        "SELECT 1 AS id, CAST('1990-02-24' AS DATE) AS pd, CAST('1990-02-24' AS TIMESTAMP) AS pt")
+      checkAnswer(actual, expected)
+      sql("DROP TABLE order")
+    }
+  }
+
   def testCommandAvailable(command: String): Boolean = {
     val attempt = Try(Process(command).run(ProcessLogger(_ => ())).exitValue())
     attempt.isSuccess && attempt.get == 0
author	hyukjinkwon <gurwls223@gmail.com>	2016-09-09 14:23:05 -0700
committer	Davies Liu <davies.liu@gmail.com>	2016-09-09 14:23:05 -0700
commit	f7d2143705c8c1baeed0bc62940f9dba636e705b (patch)
tree	8067836599fbfb1a71595fb01551e0f775c6b644 /sql/hive
parent	a3981c28c956a82ccf5b1c61d45b6bd252d4abed (diff)
download	spark-f7d2143705c8c1baeed0bc62940f9dba636e705b.tar.gz spark-f7d2143705c8c1baeed0bc62940f9dba636e705b.tar.bz2 spark-f7d2143705c8c1baeed0bc62940f9dba636e705b.zip