[SPARK-15103][SQL] Refactored FileCatalog class to allow StreamFileCatalog to infer partitioning - spark

diff options

author	Tathagata Das <tathagata.das1565@gmail.com>	2016-05-04 11:02:48 -0700
committer	Tathagata Das <tathagata.das1565@gmail.com>	2016-05-04 11:02:48 -0700
commit	0fd3a4748416233f034ec137d95f0a4c8712d396 (patch)
tree	6c370ad0188f01d2d2b9fa9f232791a1743fc6cc /python/pyspark/sql/conf.py
parent	6274a520fa743b7d079fde4a3033da5c3a2532a1 (diff)
download	spark-0fd3a4748416233f034ec137d95f0a4c8712d396.tar.gz spark-0fd3a4748416233f034ec137d95f0a4c8712d396.tar.bz2 spark-0fd3a4748416233f034ec137d95f0a4c8712d396.zip

[SPARK-15103][SQL] Refactored FileCatalog class to allow StreamFileCatalog to infer partitioning

## What changes were proposed in this pull request? File Stream Sink writes the list of written files in a metadata log. StreamFileCatalog reads the list of the files for processing. However StreamFileCatalog does not infer partitioning like HDFSFileCatalog. This PR enables that by refactoring HDFSFileCatalog to create an abstract class PartitioningAwareFileCatalog, that has all the functionality to infer partitions from a list of leaf files. - HDFSFileCatalog has been renamed to ListingFileCatalog and it extends PartitioningAwareFileCatalog by providing a list of leaf files from recursive directory scanning. - StreamFileCatalog has been renamed to MetadataLogFileCatalog and it extends PartitioningAwareFileCatalog by providing a list of leaf files from the metadata log. - The above two classes has been moved into their own files as they are not interfaces that should be in fileSourceInterfaces.scala. ## How was this patch tested? - FileStreamSinkSuite was update to see if partitioning gets inferred, and on reading whether the partitions get pruned correctly based on the query. - Other unit tests are unchanged and pass as expected. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #12879 from tdas/SPARK-15103.

Diffstat (limited to 'python/pyspark/sql/conf.py')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: