aboutsummaryrefslogtreecommitdiff
path: root/python
diff options
context:
space:
mode:
authorgatorsmile <gatorsmile@gmail.com>2016-01-25 13:38:09 -0800
committerMichael Armbrust <michael@databricks.com>2016-01-25 13:38:09 -0800
commit9348431da212ec3ab7be2b8e89a952a48b4e2a31 (patch)
tree73081cab9e62beaf17ca98783556e7b20eae6470 /python
parent00026fa9912ecee5637f1e7dd222f977f31f6766 (diff)
downloadspark-9348431da212ec3ab7be2b8e89a952a48b4e2a31.tar.gz
spark-9348431da212ec3ab7be2b8e89a952a48b4e2a31.tar.bz2
spark-9348431da212ec3ab7be2b8e89a952a48b4e2a31.zip
[SPARK-12975][SQL] Throwing Exception when Bucketing Columns are part of Partitioning Columns
When users are using `partitionBy` and `bucketBy` at the same time, some bucketing columns might be part of partitioning columns. For example, ``` df.write .format(source) .partitionBy("i") .bucketBy(8, "i", "k") .saveAsTable("bucketed_table") ``` However, in the above case, adding column `i` into `bucketBy` is useless. It is just wasting extra CPU when reading or writing bucket tables. Thus, like Hive, we can issue an exception and let users do the change. Also added a test case for checking if the information of `sortBy` and `bucketBy` columns are correctly saved in the metastore table. Could you check if my understanding is correct? cloud-fan rxin marmbrus Thanks! Author: gatorsmile <gatorsmile@gmail.com> Closes #10891 from gatorsmile/commonKeysInPartitionByBucketBy.
Diffstat (limited to 'python')
0 files changed, 0 insertions, 0 deletions