diff options
author | Takuya UESHIN <ueshin@happy-camper.st> | 2014-08-26 18:28:41 -0700 |
---|---|---|
committer | Michael Armbrust <michael@databricks.com> | 2014-08-26 18:28:41 -0700 |
commit | 727cb25bcc29481d6b744abef1ca091e64b5f91f (patch) | |
tree | 4edc54a23a8f4581e931ced97ddda2a7a5da4085 /docs/running-on-mesos.md | |
parent | 73b3089b8d2901dab11bb1ef6f46c29625b677fe (diff) | |
download | spark-727cb25bcc29481d6b744abef1ca091e64b5f91f.tar.gz spark-727cb25bcc29481d6b744abef1ca091e64b5f91f.tar.bz2 spark-727cb25bcc29481d6b744abef1ca091e64b5f91f.zip |
[SPARK-3036][SPARK-3037][SQL] Add MapType/ArrayType containing null value support to Parquet.
JIRA:
- https://issues.apache.org/jira/browse/SPARK-3036
- https://issues.apache.org/jira/browse/SPARK-3037
Currently this uses the following Parquet schema for `MapType` when `valueContainsNull` is `true`:
```
message root {
optional group a (MAP) {
repeated group map (MAP_KEY_VALUE) {
required int32 key;
optional int32 value;
}
}
}
```
for `ArrayType` when `containsNull` is `true`:
```
message root {
optional group a (LIST) {
repeated group bag {
optional int32 array;
}
}
}
```
We have to think about compatibilities with older version of Spark or Hive or others I mentioned in the JIRA issues.
Notice:
This PR is based on #1963 and #1889.
Please check them first.
/cc marmbrus, yhuai
Author: Takuya UESHIN <ueshin@happy-camper.st>
Closes #2032 from ueshin/issues/SPARK-3036_3037 and squashes the following commits:
4e8e9e7 [Takuya UESHIN] Add ArrayType containing null value support to Parquet.
013c2ca [Takuya UESHIN] Add MapType containing null value support to Parquet.
62989de [Takuya UESHIN] Merge branch 'issues/SPARK-2969' into issues/SPARK-3036_3037
8e38b53 [Takuya UESHIN] Merge branch 'issues/SPARK-3063' into issues/SPARK-3036_3037
Diffstat (limited to 'docs/running-on-mesos.md')
0 files changed, 0 insertions, 0 deletions