[SPARK-12975][SQL] Throwing Exception when Bucketing Columns are part of Partitioning Columns - spark

diff options

author	gatorsmile <gatorsmile@gmail.com>	2016-01-25 13:38:09 -0800
committer	Michael Armbrust <michael@databricks.com>	2016-01-25 13:38:09 -0800
commit	9348431da212ec3ab7be2b8e89a952a48b4e2a31 (patch)
tree	73081cab9e62beaf17ca98783556e7b20eae6470 /python
parent	00026fa9912ecee5637f1e7dd222f977f31f6766 (diff)
download	spark-9348431da212ec3ab7be2b8e89a952a48b4e2a31.tar.gz spark-9348431da212ec3ab7be2b8e89a952a48b4e2a31.tar.bz2 spark-9348431da212ec3ab7be2b8e89a952a48b4e2a31.zip

[SPARK-12975][SQL] Throwing Exception when Bucketing Columns are part of Partitioning Columns

When users are using `partitionBy` and `bucketBy` at the same time, some bucketing columns might be part of partitioning columns. For example, ``` df.write .format(source) .partitionBy("i") .bucketBy(8, "i", "k") .saveAsTable("bucketed_table") ``` However, in the above case, adding column `i` into `bucketBy` is useless. It is just wasting extra CPU when reading or writing bucket tables. Thus, like Hive, we can issue an exception and let users do the change. Also added a test case for checking if the information of `sortBy` and `bucketBy` columns are correctly saved in the metastore table. Could you check if my understanding is correct? cloud-fan rxin marmbrus Thanks! Author: gatorsmile <gatorsmile@gmail.com> Closes #10891 from gatorsmile/commonKeysInPartitionByBucketBy.

Diffstat (limited to 'python')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: