diff options
author | Andrew Or <andrew@databricks.com> | 2016-02-21 15:00:24 -0800 |
---|---|---|
committer | Reynold Xin <rxin@databricks.com> | 2016-02-21 15:00:24 -0800 |
commit | 6c3832b26e119626205732b8fd03c8f5ba986896 (patch) | |
tree | c23d83055b66647662414f4a5f835ec30efbe64f /docs | |
parent | 7eb83fefd19e137d80a23b5174b66b14831c291a (diff) | |
download | spark-6c3832b26e119626205732b8fd03c8f5ba986896.tar.gz spark-6c3832b26e119626205732b8fd03c8f5ba986896.tar.bz2 spark-6c3832b26e119626205732b8fd03c8f5ba986896.zip |
[SPARK-13080][SQL] Implement new Catalog API using Hive
## What changes were proposed in this pull request?
This is a step towards merging `SQLContext` and `HiveContext`. A new internal Catalog API was introduced in #10982 and extended in #11069. This patch introduces an implementation of this API using `HiveClient`, an existing interface to Hive. It also extends `HiveClient` with additional calls to Hive that are needed to complete the catalog implementation.
*Where should I start reviewing?* The new catalog introduced is `HiveCatalog`. This class is relatively simple because it just calls `HiveClientImpl`, where most of the new logic is. I would not start with `HiveClient`, `HiveQl`, or `HiveMetastoreCatalog`, which are modified mainly because of a refactor.
*Why is this patch so big?* I had to refactor HiveClient to remove an intermediate representation of databases, tables, partitions etc. After this refactor `CatalogTable` convert directly to and from `HiveTable` (etc.). Otherwise we would have to first convert `CatalogTable` to the intermediate representation and then convert that to HiveTable, which is messy.
The new class hierarchy is as follows:
```
org.apache.spark.sql.catalyst.catalog.Catalog
- org.apache.spark.sql.catalyst.catalog.InMemoryCatalog
- org.apache.spark.sql.hive.HiveCatalog
```
Note that, as of this patch, none of these classes are currently used anywhere yet. This will come in the future before the Spark 2.0 release.
## How was the this patch tested?
All existing unit tests, and HiveCatalogSuite that extends CatalogTestCases.
Author: Andrew Or <andrew@databricks.com>
Author: Reynold Xin <rxin@databricks.com>
Closes #11293 from rxin/hive-catalog.
Diffstat (limited to 'docs')
0 files changed, 0 insertions, 0 deletions