diff options
author | Eric Liang <ekl@databricks.com> | 2016-11-29 20:06:39 -0800 |
---|---|---|
committer | Reynold Xin <rxin@databricks.com> | 2016-11-29 20:06:39 -0800 |
commit | 489845f3a0e2a3555b96b6f3dbb984c783b20d97 (patch) | |
tree | e3f1990e36337a16c44d1437bea87324687b13b3 /docs/sql-programming-guide.md | |
parent | af9789a4f5d00b3141f102e9f0ca52217e26c082 (diff) | |
download | spark-489845f3a0e2a3555b96b6f3dbb984c783b20d97.tar.gz spark-489845f3a0e2a3555b96b6f3dbb984c783b20d97.tar.bz2 spark-489845f3a0e2a3555b96b6f3dbb984c783b20d97.zip |
[SPARK-18145] Update documentation for hive partition management in 2.1
## What changes were proposed in this pull request?
This documents the partition handling changes for Spark 2.1 and how to migrate existing tables.
## How was this patch tested?
Built docs locally.
rxin
Author: Eric Liang <ekl@databricks.com>
Closes #16074 from ericl/spark-18145.
Diffstat (limited to 'docs/sql-programming-guide.md')
-rw-r--r-- | docs/sql-programming-guide.md | 9 |
1 files changed, 9 insertions, 0 deletions
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md index 3adbe23f0a..c7ad06c639 100644 --- a/docs/sql-programming-guide.md +++ b/docs/sql-programming-guide.md @@ -1331,6 +1331,15 @@ options. # Migration Guide +## Upgrading From Spark SQL 2.0 to 2.1 + + - Datasource tables now store partition metadata in the Hive metastore. This means that Hive DDLs such as `ALTER TABLE PARTITION ... SET LOCATION` are now available for tables created with the Datasource API. + - Legacy datasource tables can be migrated to this format via the `MSCK REPAIR TABLE` command. Migrating legacy tables is recommended to take advantage of Hive DDL support and improved planning performance. + - To determine if a table has been migrated, look for the `PartitionProvider: Catalog` attribute when issuing `DESCRIBE FORMATTED` on the table. + - Changes to `INSERT OVERWRITE TABLE ... PARTITION ...` behavior for Datasource tables. + - In prior Spark versions `INSERT OVERWRITE` overwrote the entire Datasource table, even when given a partition specification. Now only partitions matching the specification are overwritten. + - Note that this still differs from the behavior of Hive tables, which is to overwrite only partitions overlapping with newly inserted data. + ## Upgrading From Spark SQL 1.6 to 2.0 - `SparkSession` is now the new entry point of Spark that replaces the old `SQLContext` and |