[SPARK-14997][SQL] Fixed FileCatalog to return correct set of files when there is no partitioning scheme in the given paths - spark

diff options

author	Tathagata Das <tathagata.das1565@gmail.com>	2016-05-06 15:04:16 -0700
committer	Yin Huai <yhuai@databricks.com>	2016-05-06 15:04:16 -0700
commit	f7b7ef41662d7d02fc4f834f3c6c4ee8802e949c (patch)
tree	715c731c578d7ebe519ae3b0473882164a418a20 /streaming/src
parent	e20cd9f4ce977739ce80a2c39f8ebae5e53f72f6 (diff)
download	spark-f7b7ef41662d7d02fc4f834f3c6c4ee8802e949c.tar.gz spark-f7b7ef41662d7d02fc4f834f3c6c4ee8802e949c.tar.bz2 spark-f7b7ef41662d7d02fc4f834f3c6c4ee8802e949c.zip

[SPARK-14997][SQL] Fixed FileCatalog to return correct set of files when there is no partitioning scheme in the given paths

## What changes were proposed in this pull request? Lets says there are json files in the following directories structure ``` xyz/file0.json xyz/subdir1/file1.json xyz/subdir2/file2.json xyz/subdir1/subsubdir1/file3.json ``` `sqlContext.read.json("xyz")` should read only file0.json according to behavior in Spark 1.6.1. However in current master, all the 4 files are read. The fix is to make FileCatalog return only the children files of the given path if there is not partitioning detected (instead of all the recursive list of files). Closes #12774 ## How was this patch tested? unit tests Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #12856 from tdas/SPARK-14997.

Diffstat (limited to 'streaming/src')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: