[SPARK-19092][SQL] Save() API of DataFrameWriter should not scan all the saved files - spark

diff options

author	gatorsmile <gatorsmile@gmail.com>	2017-01-13 13:05:53 +0800
committer	Wenchen Fan <wenchen@databricks.com>	2017-01-13 13:05:53 +0800
commit	3356b8b6a9184fcab8d0fe993f3545c3beaa4d99 (patch)
tree	d61a3ab17d0d263d8f3c7bf7200d6a5ab9bcda5e /mllib
parent	c983267b0853f908d1c671cedd18b159e6993df1 (diff)
download	spark-3356b8b6a9184fcab8d0fe993f3545c3beaa4d99.tar.gz spark-3356b8b6a9184fcab8d0fe993f3545c3beaa4d99.tar.bz2 spark-3356b8b6a9184fcab8d0fe993f3545c3beaa4d99.zip

[SPARK-19092][SQL] Save() API of DataFrameWriter should not scan all the saved files

### What changes were proposed in this pull request? `DataFrameWriter`'s [save() API](https://github.com/gatorsmile/spark/blob/5d38f09f47a767a342a0a8219c63efa2943b5d1f/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L207) is performing a unnecessary full filesystem scan for the saved files. The save() API is the most basic/core API in `DataFrameWriter`. We should avoid it. The related PR: https://github.com/apache/spark/pull/16090 ### How was this patch tested? Updated the existing test cases. Author: gatorsmile <gatorsmile@gmail.com> Closes #16481 from gatorsmile/saveFileScan.

Diffstat (limited to 'mllib')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: