diff options
author | gatorsmile <gatorsmile@gmail.com> | 2016-12-20 23:40:02 -0800 |
---|---|---|
committer | Reynold Xin <rxin@databricks.com> | 2016-12-20 23:40:02 -0800 |
commit | 24c0c94128770be9034fb69518713d7f6aa1e041 (patch) | |
tree | f367c32b5005da96c0634ba5e1f8337e5d0aa86e /python | |
parent | b2dd8ec6b2c05c996e2d7c0bf8db0073c1ee0b94 (diff) | |
download | spark-24c0c94128770be9034fb69518713d7f6aa1e041.tar.gz spark-24c0c94128770be9034fb69518713d7f6aa1e041.tar.bz2 spark-24c0c94128770be9034fb69518713d7f6aa1e041.zip |
[SPARK-18949][SQL] Add recoverPartitions API to Catalog
### What changes were proposed in this pull request?
Currently, we only have a SQL interface for recovering all the partitions in the directory of a table and update the catalog. `MSCK REPAIR TABLE` or `ALTER TABLE table RECOVER PARTITIONS`. (Actually, very hard for me to remember `MSCK` and have no clue what it means)
After the new "Scalable Partition Handling", the table repair becomes much more important for making visible the data in the created data source partitioned table.
Thus, this PR is to add it into the Catalog interface. After this PR, users can repair the table by
```Scala
spark.catalog.recoverPartitions("testTable")
```
### How was this patch tested?
Modified the existing test cases.
Author: gatorsmile <gatorsmile@gmail.com>
Closes #16356 from gatorsmile/repairTable.
Diffstat (limited to 'python')
-rw-r--r-- | python/pyspark/sql/catalog.py | 5 |
1 files changed, 5 insertions, 0 deletions
diff --git a/python/pyspark/sql/catalog.py b/python/pyspark/sql/catalog.py index a36d02e0db..30c7a3fe4f 100644 --- a/python/pyspark/sql/catalog.py +++ b/python/pyspark/sql/catalog.py @@ -258,6 +258,11 @@ class Catalog(object): """Invalidate and refresh all the cached metadata of the given table.""" self._jcatalog.refreshTable(tableName) + @since('2.1.1') + def recoverPartitions(self, tableName): + """Recover all the partitions of the given table and update the catalog.""" + self._jcatalog.recoverPartitions(tableName) + def _reset(self): """(Internal use only) Drop all existing databases (except "default"), tables, partitions and functions, and set the current database to "default". |