[SPARK-18021][SQL] Refactor file name specification for data sources - spark

diff options

author	Reynold Xin <rxin@databricks.com>	2016-10-20 12:18:56 -0700
committer	Reynold Xin <rxin@databricks.com>	2016-10-20 12:18:56 -0700
commit	7f9ec19eae60abe589ffd22259a9065e7e353a57 (patch)
tree	304e751a63b5ec83ec4e8fa918573020890f2ae5 /docs/building-spark.md
parent	947f4f25273161dc4719419a35613a71c2e2a150 (diff)
download	spark-7f9ec19eae60abe589ffd22259a9065e7e353a57.tar.gz spark-7f9ec19eae60abe589ffd22259a9065e7e353a57.tar.bz2 spark-7f9ec19eae60abe589ffd22259a9065e7e353a57.zip

[SPARK-18021][SQL] Refactor file name specification for data sources

## What changes were proposed in this pull request? Currently each data source OutputWriter is responsible for specifying the entire file name for each file output. This, however, does not make any sense because we rely on file naming schemes for certain behaviors in Spark SQL, e.g. bucket id. The current approach allows individual data sources to break the implementation of bucketing. On the flip side, we also don't want to move file naming entirely out of data sources, because different data sources do want to specify different extensions. This patch divides file name specification into two parts: the first part is a prefix specified by the caller of OutputWriter (in WriteOutput), and the second part is the suffix that can be specified by the OutputWriter itself. Note that a side effect of this change is that now all file based data sources also support bucketing automatically. There are also some other minor cleanups: - Removed the UUID passed through generic Configuration string - Some minor rewrites for better clarity - Renamed "path" in multiple places to "stagingDir", to more accurately reflect its meaning ## How was this patch tested? This should be covered by existing data source tests. Author: Reynold Xin <rxin@databricks.com> Closes #15562 from rxin/SPARK-18021.

Diffstat (limited to 'docs/building-spark.md')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: