diff options
author | Liang-Chi Hsieh <viirya@gmail.com> | 2015-07-08 10:09:50 -0700 |
---|---|---|
committer | Cheng Lian <lian@databricks.com> | 2015-07-08 10:09:50 -0700 |
commit | 6722aca809ddc28aa20abf3bbb2e0de8629a9903 (patch) | |
tree | 9e11f3a4da4feebee391bfcaa211536bd7ba6be9 /unsafe | |
parent | bf02e377168f39459d5c216e939097ae5705f573 (diff) | |
download | spark-6722aca809ddc28aa20abf3bbb2e0de8629a9903.tar.gz spark-6722aca809ddc28aa20abf3bbb2e0de8629a9903.tar.bz2 spark-6722aca809ddc28aa20abf3bbb2e0de8629a9903.zip |
[SPARK-8785] [SQL] Improve Parquet schema merging
JIRA: https://issues.apache.org/jira/browse/SPARK-8785
Currently, the parquet schema merging (`ParquetRelation2.readSchema`) may spend much time to merge duplicate schema. We can select only non duplicate schema and merge them later.
Author: Liang-Chi Hsieh <viirya@gmail.com>
Author: Liang-Chi Hsieh <viirya@appier.com>
Closes #7182 from viirya/improve_parquet_merging and squashes the following commits:
5cf934f [Liang-Chi Hsieh] Refactor it to make it faster.
f3411ea [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into improve_parquet_merging
a63c3ff [Liang-Chi Hsieh] Improve Parquet schema merging.
Diffstat (limited to 'unsafe')
0 files changed, 0 insertions, 0 deletions