aboutsummaryrefslogtreecommitdiff
path: root/core
diff options
context:
space:
mode:
authorDongjoon Hyun <dongjoon@apache.org>2016-06-10 12:43:27 -0700
committerMichael Armbrust <michael@databricks.com>2016-06-10 12:43:27 -0700
commit2413fce9d6812a91eeffb4435c2b5b361d23214b (patch)
tree91d746b031e47725765e5e82f314de3970f56132 /core
parent7d7a0a5e0749909e97d90188707cc9220a1bb73a (diff)
downloadspark-2413fce9d6812a91eeffb4435c2b5b361d23214b.tar.gz
spark-2413fce9d6812a91eeffb4435c2b5b361d23214b.tar.bz2
spark-2413fce9d6812a91eeffb4435c2b5b361d23214b.zip
[SPARK-15743][SQL] Prevent saving with all-column partitioning
## What changes were proposed in this pull request? When saving datasets on storage, `partitionBy` provides an easy way to construct the directory structure. However, if a user choose all columns as partition columns, some exceptions occurs. - **ORC with all column partitioning**: `AnalysisException` on **future read** due to schema inference failure. ```scala scala> spark.range(10).write.format("orc").mode("overwrite").partitionBy("id").save("/tmp/data") scala> spark.read.format("orc").load("/tmp/data").collect() org.apache.spark.sql.AnalysisException: Unable to infer schema for ORC at /tmp/data. It must be specified manually; ``` - **Parquet with all-column partitioning**: `InvalidSchemaException` on **write execution** due to Parquet limitation. ```scala scala> spark.range(100).write.format("parquet").mode("overwrite").partitionBy("id").save("/tmp/data") [Stage 0:> (0 + 8) / 8]16/06/02 16:51:17 ERROR Utils: Aborting task org.apache.parquet.schema.InvalidSchemaException: A group type can not be empty. Parquet does not support empty group without leaves. Empty group: spark_schema ... (lots of error messages) ``` Although some formats like JSON support all-column partitioning without any problem, it seems not a good idea to make lots of empty directories. This PR prevents saving with all-column partitioning by consistently raising `AnalysisException` before executing save operation. ## How was this patch tested? Newly added `PartitioningUtilsSuite`. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #13486 from dongjoon-hyun/SPARK-15743.
Diffstat (limited to 'core')
0 files changed, 0 insertions, 0 deletions