[SPARK-8014] [SQL] Avoid premature metadata discovery when writing a HadoopFsRelation with a save mode other than Append - spark

diff options

author	Cheng Lian <lian@databricks.com>	2015-06-02 13:32:13 -0700
committer	Yin Huai <yhuai@databricks.com>	2015-06-02 13:32:13 -0700
commit	686a45f0b9c50ede2a80854ed6a155ee8a9a4f5c (patch)
tree	5bf5776140d9906a2cc4c677df3de9313b6effe2 /python/pyspark/sql/column.py
parent	ad06727fe985ca243ebdaaba55cd7d35a4749d0a (diff)
download	spark-686a45f0b9c50ede2a80854ed6a155ee8a9a4f5c.tar.gz spark-686a45f0b9c50ede2a80854ed6a155ee8a9a4f5c.tar.bz2 spark-686a45f0b9c50ede2a80854ed6a155ee8a9a4f5c.zip

[SPARK-8014] [SQL] Avoid premature metadata discovery when writing a HadoopFsRelation with a save mode other than Append

The current code references the schema of the DataFrame to be written before checking save mode. This triggers expensive metadata discovery prematurely. For save mode other than `Append`, this metadata discovery is useless since we either ignore the result (for `Ignore` and `ErrorIfExists`) or delete existing files (for `Overwrite`) later. This PR fixes this issue by deferring metadata discovery after save mode checking. Author: Cheng Lian <lian@databricks.com> Closes #6583 from liancheng/spark-8014 and squashes the following commits: 1aafabd [Cheng Lian] Updates comments 088abaa [Cheng Lian] Avoids schema merging and partition discovery when data schema and partition schema are defined 8fbd93f [Cheng Lian] Fixes SPARK-8014

Diffstat (limited to 'python/pyspark/sql/column.py')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: