diff options
author | Liang-Chi Hsieh <viirya@gmail.com> | 2015-05-17 15:42:21 +0800 |
---|---|---|
committer | Cheng Lian <lian@databricks.com> | 2015-05-17 15:42:21 +0800 |
commit | 339905578790fa37fcad9684b859b443313a5aa2 (patch) | |
tree | 4c17f064797533b45b7f5f86924691b7319d4b8f /pom.xml | |
parent | edf09ea1bd4bf7692e0085ad9c70cb1bfc8d06d8 (diff) | |
download | spark-339905578790fa37fcad9684b859b443313a5aa2.tar.gz spark-339905578790fa37fcad9684b859b443313a5aa2.tar.bz2 spark-339905578790fa37fcad9684b859b443313a5aa2.zip |
[SPARK-7447] [SQL] Don't re-merge Parquet schema when the relation is deserialized
JIRA: https://issues.apache.org/jira/browse/SPARK-7447
`MetadataCache` in `ParquetRelation2` is annotated as `transient`. When `ParquetRelation2` is deserialized, we ask `MetadataCache` to refresh and perform schema merging again. It is time-consuming especially for very many parquet files.
With the new `FSBasedParquetRelation`, although `MetadataCache` is not `transient` now, `MetadataCache.refresh()` still performs schema merging again when the relation is deserialized.
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes #6012 from viirya/without_remerge_schema and squashes the following commits:
2663957 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into without_remerge_schema
6ac7d93 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into without_remerge_schema
b0fc09b [Liang-Chi Hsieh] Don't generate and merge parquetSchema multiple times.
Diffstat (limited to 'pom.xml')
0 files changed, 0 insertions, 0 deletions