aboutsummaryrefslogtreecommitdiff
path: root/docs/sql-programming-guide.md
diff options
context:
space:
mode:
authorEric Liang <ekl@databricks.com>2016-11-29 20:06:39 -0800
committerReynold Xin <rxin@databricks.com>2016-11-29 20:06:39 -0800
commit489845f3a0e2a3555b96b6f3dbb984c783b20d97 (patch)
treee3f1990e36337a16c44d1437bea87324687b13b3 /docs/sql-programming-guide.md
parentaf9789a4f5d00b3141f102e9f0ca52217e26c082 (diff)
downloadspark-489845f3a0e2a3555b96b6f3dbb984c783b20d97.tar.gz
spark-489845f3a0e2a3555b96b6f3dbb984c783b20d97.tar.bz2
spark-489845f3a0e2a3555b96b6f3dbb984c783b20d97.zip
[SPARK-18145] Update documentation for hive partition management in 2.1
## What changes were proposed in this pull request? This documents the partition handling changes for Spark 2.1 and how to migrate existing tables. ## How was this patch tested? Built docs locally. rxin Author: Eric Liang <ekl@databricks.com> Closes #16074 from ericl/spark-18145.
Diffstat (limited to 'docs/sql-programming-guide.md')
-rw-r--r--docs/sql-programming-guide.md9
1 files changed, 9 insertions, 0 deletions
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 3adbe23f0a..c7ad06c639 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -1331,6 +1331,15 @@ options.
# Migration Guide
+## Upgrading From Spark SQL 2.0 to 2.1
+
+ - Datasource tables now store partition metadata in the Hive metastore. This means that Hive DDLs such as `ALTER TABLE PARTITION ... SET LOCATION` are now available for tables created with the Datasource API.
+ - Legacy datasource tables can be migrated to this format via the `MSCK REPAIR TABLE` command. Migrating legacy tables is recommended to take advantage of Hive DDL support and improved planning performance.
+ - To determine if a table has been migrated, look for the `PartitionProvider: Catalog` attribute when issuing `DESCRIBE FORMATTED` on the table.
+ - Changes to `INSERT OVERWRITE TABLE ... PARTITION ...` behavior for Datasource tables.
+ - In prior Spark versions `INSERT OVERWRITE` overwrote the entire Datasource table, even when given a partition specification. Now only partitions matching the specification are overwritten.
+ - Note that this still differs from the behavior of Hive tables, which is to overwrite only partitions overlapping with newly inserted data.
+
## Upgrading From Spark SQL 1.6 to 2.0
- `SparkSession` is now the new entry point of Spark that replaces the old `SQLContext` and