aboutsummaryrefslogtreecommitdiff
path: root/R/pkg/inst
diff options
context:
space:
mode:
authorEric Liang <ekl@databricks.com>2016-11-10 17:00:43 -0800
committerReynold Xin <rxin@databricks.com>2016-11-10 17:00:43 -0800
commita3356343cbf58b930326f45721fb4ecade6f8029 (patch)
tree67d7c367c0dea780f5c8d3ff78a5525c4e3ad520 /R/pkg/inst
parente0deee1f7df31177cfc14bbb296f0baa372f473d (diff)
downloadspark-a3356343cbf58b930326f45721fb4ecade6f8029.tar.gz
spark-a3356343cbf58b930326f45721fb4ecade6f8029.tar.bz2
spark-a3356343cbf58b930326f45721fb4ecade6f8029.zip
[SPARK-18185] Fix all forms of INSERT / OVERWRITE TABLE for Datasource tables
## What changes were proposed in this pull request? As of current 2.1, INSERT OVERWRITE with dynamic partitions against a Datasource table will overwrite the entire table instead of only the partitions matching the static keys, as in Hive. It also doesn't respect custom partition locations. This PR adds support for all these operations to Datasource tables managed by the Hive metastore. It is implemented as follows - During planning time, the full set of partitions affected by an INSERT or OVERWRITE command is read from the Hive metastore. - The planner identifies any partitions with custom locations and includes this in the write task metadata. - FileFormatWriter tasks refer to this custom locations map when determining where to write for dynamic partition output. - When the write job finishes, the set of written partitions is compared against the initial set of matched partitions, and the Hive metastore is updated to reflect the newly added / removed partitions. It was necessary to introduce a method for staging files with absolute output paths to `FileCommitProtocol`. These files are not handled by the Hadoop output committer but are moved to their final locations when the job commits. The overwrite behavior of legacy Datasource tables is also changed: no longer will the entire table be overwritten if a partial partition spec is present. cc cloud-fan yhuai ## How was this patch tested? Unit tests, existing tests. Author: Eric Liang <ekl@databricks.com> Author: Wenchen Fan <wenchen@databricks.com> Closes #15814 from ericl/sc-5027.
Diffstat (limited to 'R/pkg/inst')
0 files changed, 0 insertions, 0 deletions