aboutsummaryrefslogtreecommitdiff
path: root/sql/core/src/test/resources
diff options
context:
space:
mode:
authorDamian Guy <damian.guy@gmail.com>2015-08-11 12:46:33 +0800
committerCheng Lian <lian@databricks.com>2015-08-11 12:46:33 +0800
commit071bbad5db1096a548c886762b611a8484a52753 (patch)
tree5ef7be83e9fa717f01a04d9ccfdb5dfb5d9938c1 /sql/core/src/test/resources
parent3c9802d9400bea802984456683b2736a450ee17e (diff)
downloadspark-071bbad5db1096a548c886762b611a8484a52753.tar.gz
spark-071bbad5db1096a548c886762b611a8484a52753.tar.bz2
spark-071bbad5db1096a548c886762b611a8484a52753.zip
[SPARK-9340] [SQL] Fixes converting unannotated Parquet lists
This PR is inspired by #8063 authored by dguy. Especially, testing Parquet files added here are all taken from that PR. **Committer who merges this PR should attribute it to "Damian Guy <damian.guygmail.com>".** ---- SPARK-6776 and SPARK-6777 followed `parquet-avro` to implement backwards-compatibility rules defined in `parquet-format` spec. However, both Spark SQL and `parquet-avro` neglected the following statement in `parquet-format`: > This does not affect repeated fields that are not annotated: A repeated field that is neither contained by a `LIST`- or `MAP`-annotated group nor annotated by `LIST` or `MAP` should be interpreted as a required list of required elements where the element type is the type of the field. One of the consequences is that, Parquet files generated by `parquet-protobuf` containing unannotated repeated fields are not correctly converted to Catalyst arrays. This PR fixes this issue by 1. Handling unannotated repeated fields in `CatalystSchemaConverter`. 2. Converting this kind of special repeated fields to Catalyst arrays in `CatalystRowConverter`. Two special converters, `RepeatedPrimitiveConverter` and `RepeatedGroupConverter`, are added. They delegate actual conversion work to a child `elementConverter` and accumulates elements in an `ArrayBuffer`. Two extra methods, `start()` and `end()`, are added to `ParentContainerUpdater`. So that they can be used to initialize new `ArrayBuffer`s for unannotated repeated fields, and propagate converted array values to upstream. Author: Cheng Lian <lian@databricks.com> Closes #8070 from liancheng/spark-9340/unannotated-parquet-list and squashes the following commits: ace6df7 [Cheng Lian] Moves ParquetProtobufCompatibilitySuite f1c7bfd [Cheng Lian] Updates .rat-excludes 420ad2b [Cheng Lian] Fixes converting unannotated Parquet lists
Diffstat (limited to 'sql/core/src/test/resources')
-rw-r--r--sql/core/src/test/resources/nested-array-struct.parquetbin0 -> 775 bytes
-rw-r--r--sql/core/src/test/resources/old-repeated-int.parquetbin0 -> 389 bytes
-rw-r--r--sql/core/src/test/resources/old-repeated-message.parquetbin0 -> 600 bytes
-rw-r--r--sql/core/src/test/resources/old-repeated.parquetbin0 -> 432 bytes
-rw-r--r--[-rwxr-xr-x]sql/core/src/test/resources/parquet-thrift-compat.snappy.parquetbin10550 -> 10550 bytes
-rw-r--r--sql/core/src/test/resources/proto-repeated-string.parquetbin0 -> 411 bytes
-rw-r--r--sql/core/src/test/resources/proto-repeated-struct.parquetbin0 -> 608 bytes
-rw-r--r--sql/core/src/test/resources/proto-struct-with-array-many.parquetbin0 -> 802 bytes
-rw-r--r--sql/core/src/test/resources/proto-struct-with-array.parquetbin0 -> 1576 bytes
9 files changed, 0 insertions, 0 deletions
diff --git a/sql/core/src/test/resources/nested-array-struct.parquet b/sql/core/src/test/resources/nested-array-struct.parquet
new file mode 100644
index 0000000000..41a43fa35d
--- /dev/null
+++ b/sql/core/src/test/resources/nested-array-struct.parquet
Binary files differ
diff --git a/sql/core/src/test/resources/old-repeated-int.parquet b/sql/core/src/test/resources/old-repeated-int.parquet
new file mode 100644
index 0000000000..520922f73e
--- /dev/null
+++ b/sql/core/src/test/resources/old-repeated-int.parquet
Binary files differ
diff --git a/sql/core/src/test/resources/old-repeated-message.parquet b/sql/core/src/test/resources/old-repeated-message.parquet
new file mode 100644
index 0000000000..548db99162
--- /dev/null
+++ b/sql/core/src/test/resources/old-repeated-message.parquet
Binary files differ
diff --git a/sql/core/src/test/resources/old-repeated.parquet b/sql/core/src/test/resources/old-repeated.parquet
new file mode 100644
index 0000000000..213f1a9029
--- /dev/null
+++ b/sql/core/src/test/resources/old-repeated.parquet
Binary files differ
diff --git a/sql/core/src/test/resources/parquet-thrift-compat.snappy.parquet b/sql/core/src/test/resources/parquet-thrift-compat.snappy.parquet
index 837e4876ee..837e4876ee 100755..100644
--- a/sql/core/src/test/resources/parquet-thrift-compat.snappy.parquet
+++ b/sql/core/src/test/resources/parquet-thrift-compat.snappy.parquet
Binary files differ
diff --git a/sql/core/src/test/resources/proto-repeated-string.parquet b/sql/core/src/test/resources/proto-repeated-string.parquet
new file mode 100644
index 0000000000..8a7eea601d
--- /dev/null
+++ b/sql/core/src/test/resources/proto-repeated-string.parquet
Binary files differ
diff --git a/sql/core/src/test/resources/proto-repeated-struct.parquet b/sql/core/src/test/resources/proto-repeated-struct.parquet
new file mode 100644
index 0000000000..c29eee35c3
--- /dev/null
+++ b/sql/core/src/test/resources/proto-repeated-struct.parquet
Binary files differ
diff --git a/sql/core/src/test/resources/proto-struct-with-array-many.parquet b/sql/core/src/test/resources/proto-struct-with-array-many.parquet
new file mode 100644
index 0000000000..ff9809675f
--- /dev/null
+++ b/sql/core/src/test/resources/proto-struct-with-array-many.parquet
Binary files differ
diff --git a/sql/core/src/test/resources/proto-struct-with-array.parquet b/sql/core/src/test/resources/proto-struct-with-array.parquet
new file mode 100644
index 0000000000..325a8370ad
--- /dev/null
+++ b/sql/core/src/test/resources/proto-struct-with-array.parquet
Binary files differ