diff options
author | Herman van Hovell <hvanhovell@databricks.com> | 2017-02-10 11:06:57 -0800 |
---|---|---|
committer | Wenchen Fan <wenchen@databricks.com> | 2017-02-10 11:06:57 -0800 |
commit | de8a03e68202647555e30fffba551f65bc77608d (patch) | |
tree | f529ed7b5fe76475226cef8a99061c0bec235198 /mllib-local | |
parent | dadff5f0789cce7cf3728a8adaab42118e5dc019 (diff) | |
download | spark-de8a03e68202647555e30fffba551f65bc77608d.tar.gz spark-de8a03e68202647555e30fffba551f65bc77608d.tar.bz2 spark-de8a03e68202647555e30fffba551f65bc77608d.zip |
[SPARK-19459][SQL] Add Hive datatype (char/varchar) to StructField metadata
## What changes were proposed in this pull request?
Reading from an existing ORC table which contains `char` or `varchar` columns can fail with a `ClassCastException` if the table metadata has been created using Spark. This is caused by the fact that spark internally replaces `char` and `varchar` columns with a `string` column.
This PR fixes this by adding the hive type to the `StructField's` metadata under the `HIVE_TYPE_STRING` key. This is picked up by the `HiveClient` and the ORC reader, see https://github.com/apache/spark/pull/16060 for more details on how the metadata is used.
## How was this patch tested?
Added a regression test to `OrcSourceSuite`.
Author: Herman van Hovell <hvanhovell@databricks.com>
Closes #16804 from hvanhovell/SPARK-19459.
Diffstat (limited to 'mllib-local')
0 files changed, 0 insertions, 0 deletions