aboutsummaryrefslogtreecommitdiff
path: root/python/pyspark/sql/readwriter.py
diff options
context:
space:
mode:
authorgatorsmile <gatorsmile@gmail.com>2017-01-18 02:01:30 +0800
committerWenchen Fan <wenchen@databricks.com>2017-01-18 02:01:30 +0800
commita23debd7bc8f85ea49c54b8cf3cd112cf0a803ff (patch)
treeabb2b167618351b3181ed4956dd93daf356c7359 /python/pyspark/sql/readwriter.py
parenta83accfcfd6a92afac5040c50577258ab83d10dd (diff)
downloadspark-a23debd7bc8f85ea49c54b8cf3cd112cf0a803ff.tar.gz
spark-a23debd7bc8f85ea49c54b8cf3cd112cf0a803ff.tar.bz2
spark-a23debd7bc8f85ea49c54b8cf3cd112cf0a803ff.zip
[SPARK-19129][SQL] SessionCatalog: Disallow empty part col values in partition spec
### What changes were proposed in this pull request? Empty partition column values are not valid for partition specification. Before this PR, we accept users to do it; however, Hive metastore does not detect and disallow it too. Thus, users hit the following strange error. ```Scala val df = spark.createDataFrame(Seq((0, "a"), (1, "b"))).toDF("partCol1", "name") df.write.mode("overwrite").partitionBy("partCol1").saveAsTable("partitionedTable") spark.sql("alter table partitionedTable drop partition(partCol1='')") spark.table("partitionedTable").show() ``` In the above example, the WHOLE table is DROPPED when users specify a partition spec containing only one partition column with empty values. When the partition columns contains more than one, Hive metastore APIs simply ignore the columns with empty values and treat it as partial spec. This is also not expected. This does not follow the actual Hive behaviors. This PR is to disallow users to specify such an invalid partition spec in the `SessionCatalog` APIs. ### How was this patch tested? Added test cases Author: gatorsmile <gatorsmile@gmail.com> Closes #16583 from gatorsmile/disallowEmptyPartColValue.
Diffstat (limited to 'python/pyspark/sql/readwriter.py')
0 files changed, 0 insertions, 0 deletions