[SPARK-16777][SQL] Do not use deprecated listType API in ParquetSchemaConverter

## What changes were proposed in this pull request? This PR removes build waning as below. ```scala [WARNING] .../spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala:448: method listType in object ConversionPatterns is deprecated: see corresponding Javadoc for more information. [WARNING] ConversionPatterns.listType( [WARNING] ^ [WARNING] .../spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala:464: method listType in object ConversionPatterns is deprecated: see corresponding Javadoc for more information. [WARNING] ConversionPatterns.listType( [WARNING] ^ ``` This should not use `listOfElements` (recommended to be replaced from `listType`) instead because the new method checks if the name of elements in Parquet's `LIST` is `element` in Parquet schema and throws an exception if not. However, It seems Spark prior to 1.4.x writes `ArrayType` with Parquet's `LIST` but with `array` as its element name. Therefore, this PR avoids to use both `listOfElements` and `listType` but just use the existing schema builder to construct the same `GroupType`. ## How was this patch tested? Existing tests should cover this. Author: hyukjinkwon <gurwls223@gmail.com> Closes #14399 from HyukjinKwon/SPARK-16777.
author: hyukjinkwon <gurwls223@gmail.com> 2016-09-28 00:39:47 +0800
committer: Cheng Lian <lian@databricks.com> 2016-09-28 00:39:47 +0800
commit: 5de1737b02710e36f6804d2ae243d1aeb30a0b32 (patch)
tree: f11d8c8057f66ccf35e1186e5cedeef18e390af5
parent: 6a68c5d7b4eb07e4ed6b702dd1536cd08d9bba7d (diff)
download: spark-5de1737b02710e36f6804d2ae243d1aeb30a0b32.tar.gz
spark-5de1737b02710e36f6804d2ae243d1aeb30a0b32.tar.bz2
spark-5de1737b02710e36f6804d2ae243d1aeb30a0b32.zip
1 files changed, 17 insertions, 9 deletions
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala
index c81a65f497..b4f36ce375 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala
@@ -445,14 +445,20 @@ private[parquet] class ParquetSchemaConverter(
         //     repeated <element-type> array;
         //   }
         // }
-        ConversionPatterns.listType(
-          repetition,
-          field.name,
-          Types
+
+        // This should not use `listOfElements` here because this new method checks if the
+        // element name is `element` in the `GroupType` and throws an exception if not.
+        // As mentioned above, Spark prior to 1.4.x writes `ArrayType` as `LIST` but with
+        // `array` as its element name as below. Therefore, we build manually
+        // the correct group type here via the builder. (See SPARK-16777)
+        Types
+          .buildGroup(repetition).as(LIST)
+          .addField(Types
             .buildGroup(REPEATED)
-            // "array_element" is the name chosen by parquet-hive (1.7.0 and prior version)
+            // "array" is the name chosen by parquet-hive (1.7.0 and prior version)
             .addField(convertField(StructField("array", elementType, nullable)))
             .named("bag"))
+          .named(field.name)
 
       // Spark 1.4.x and prior versions convert ArrayType with non-nullable elements into a 2-level
       // LIST structure.  This behavior mimics parquet-avro (1.6.0rc3).  Note that this case is
@@ -461,11 +467,13 @@ private[parquet] class ParquetSchemaConverter(
         // <list-repetition> group <name> (LIST) {
         //   repeated <element-type> element;
         // }
-        ConversionPatterns.listType(
-          repetition,
-          field.name,
+
+        // Here too, we should not use `listOfElements`. (See SPARK-16777)
+        Types
+          .buildGroup(repetition).as(LIST)
           // "array" is the name chosen by parquet-avro (1.7.0 and prior version)
-          convertField(StructField("array", elementType, nullable), REPEATED))
+          .addField(convertField(StructField("array", elementType, nullable), REPEATED))
+          .named(field.name)
 
       // Spark 1.4.x and prior versions convert MapType into a 3-level group annotated by
       // MAP_KEY_VALUE.  This is covered by `convertGroupField(field: GroupType): DataType`.
author	hyukjinkwon <gurwls223@gmail.com>	2016-09-28 00:39:47 +0800
committer	Cheng Lian <lian@databricks.com>	2016-09-28 00:39:47 +0800
commit	5de1737b02710e36f6804d2ae243d1aeb30a0b32 (patch)
tree	f11d8c8057f66ccf35e1186e5cedeef18e390af5
parent	6a68c5d7b4eb07e4ed6b702dd1536cd08d9bba7d (diff)
download	spark-5de1737b02710e36f6804d2ae243d1aeb30a0b32.tar.gz spark-5de1737b02710e36f6804d2ae243d1aeb30a0b32.tar.bz2 spark-5de1737b02710e36f6804d2ae243d1aeb30a0b32.zip