aboutsummaryrefslogtreecommitdiff
path: root/NOTICE
diff options
context:
space:
mode:
authorXiao Li <gatorsmile@gmail.com>2017-03-15 10:53:58 +0800
committerWenchen Fan <wenchen@databricks.com>2017-03-15 10:53:58 +0800
commitf9a93b1b4a20e7c72d900362b269edab66e73dd8 (patch)
treef816b258082a9a6f567bda055d1070de6b3ff370 /NOTICE
parentd1f6c64c4b763c05d6d79ae5497f298dc3835f3e (diff)
downloadspark-f9a93b1b4a20e7c72d900362b269edab66e73dd8.tar.gz
spark-f9a93b1b4a20e7c72d900362b269edab66e73dd8.tar.bz2
spark-f9a93b1b4a20e7c72d900362b269edab66e73dd8.zip
[SPARK-18112][SQL] Support reading data from Hive 2.1 metastore
### What changes were proposed in this pull request? This PR is to support reading data from Hive 2.1 metastore. Need to update shim class because of the Hive API changes caused by the following three Hive JIRAs: - [HIVE-12730 MetadataUpdater: provide a mechanism to edit the basic statistics of a table (or a partition)](https://issues.apache.org/jira/browse/HIVE-12730) - [Hive-13341 Stats state is not captured correctly: differentiate load table and create table](https://issues.apache.org/jira/browse/HIVE-13341) - [HIVE-13622 WriteSet tracking optimizations](https://issues.apache.org/jira/browse/HIVE-13622) There are three new fields added to Hive APIs. - `boolean hasFollowingStatsTask`. We always set it to `false`. This is to keep the existing behavior unchanged (starting from 0.13), no matter which Hive metastore client version users choose. If we set it to `true`, the basic table statistics is not collected by Hive. For example, ```SQL CREATE TABLE tbl AS SELECT 1 AS a ``` When setting `hasFollowingStatsTask ` to `false`, the table properties is like ``` Properties: [numFiles=1, transient_lastDdlTime=1489513927, totalSize=2] ``` When setting `hasFollowingStatsTask ` to `true`, the table properties is like ``` Properties: [transient_lastDdlTime=1489513563] ``` - `AcidUtils.Operation operation`. Obviously, we do not support ACID. Thus, we set it to `AcidUtils.Operation.NOT_ACID`. - `EnvironmentContext environmentContext`. So far, this is always set to `null`. This was introduced for supporting DDL `alter table s update statistics set ('numRows'='NaN')`. Using this DDL, users can specify the statistics. So far, our Spark SQL does not need it, because we use different table properties to store our generated statistics values. However, when Spark SQL issues ALTER TABLE DDL statements, Hive metastore always automatically invalidate the Hive-generated statistics. In the follow-up PR, we can fix it by explicitly adding a property to `environmentContext`. ```JAVA putToProperties(StatsSetupConst.STATS_GENERATED, StatsSetupConst.USER) ``` Another alternative is to set `DO_NOT_UPDATE_STATS`to `TRUE`. See the Hive JIRA: https://issues.apache.org/jira/browse/HIVE-15653. We will not address it in this PR. ### How was this patch tested? Added test cases to VersionsSuite.scala Author: Xiao Li <gatorsmile@gmail.com> Closes #17232 from gatorsmile/Hive21.
Diffstat (limited to 'NOTICE')
0 files changed, 0 insertions, 0 deletions