diff options
author | Dongjoon Hyun <dongjoon@apache.org> | 2016-06-10 12:43:27 -0700 |
---|---|---|
committer | Michael Armbrust <michael@databricks.com> | 2016-06-10 12:43:27 -0700 |
commit | 2413fce9d6812a91eeffb4435c2b5b361d23214b (patch) | |
tree | 91d746b031e47725765e5e82f314de3970f56132 /README.md | |
parent | 7d7a0a5e0749909e97d90188707cc9220a1bb73a (diff) | |
download | spark-2413fce9d6812a91eeffb4435c2b5b361d23214b.tar.gz spark-2413fce9d6812a91eeffb4435c2b5b361d23214b.tar.bz2 spark-2413fce9d6812a91eeffb4435c2b5b361d23214b.zip |
[SPARK-15743][SQL] Prevent saving with all-column partitioning
## What changes were proposed in this pull request?
When saving datasets on storage, `partitionBy` provides an easy way to construct the directory structure. However, if a user choose all columns as partition columns, some exceptions occurs.
- **ORC with all column partitioning**: `AnalysisException` on **future read** due to schema inference failure.
```scala
scala> spark.range(10).write.format("orc").mode("overwrite").partitionBy("id").save("/tmp/data")
scala> spark.read.format("orc").load("/tmp/data").collect()
org.apache.spark.sql.AnalysisException: Unable to infer schema for ORC at /tmp/data. It must be specified manually;
```
- **Parquet with all-column partitioning**: `InvalidSchemaException` on **write execution** due to Parquet limitation.
```scala
scala> spark.range(100).write.format("parquet").mode("overwrite").partitionBy("id").save("/tmp/data")
[Stage 0:> (0 + 8) / 8]16/06/02 16:51:17
ERROR Utils: Aborting task
org.apache.parquet.schema.InvalidSchemaException: A group type can not be empty. Parquet does not support empty group without leaves. Empty group: spark_schema
... (lots of error messages)
```
Although some formats like JSON support all-column partitioning without any problem, it seems not a good idea to make lots of empty directories.
This PR prevents saving with all-column partitioning by consistently raising `AnalysisException` before executing save operation.
## How was this patch tested?
Newly added `PartitioningUtilsSuite`.
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes #13486 from dongjoon-hyun/SPARK-15743.
Diffstat (limited to 'README.md')
0 files changed, 0 insertions, 0 deletions