aboutsummaryrefslogtreecommitdiff
path: root/streaming
diff options
context:
space:
mode:
authorBurak Yavuz <brkyvz@gmail.com>2016-09-22 13:05:41 -0700
committerJosh Rosen <joshrosen@databricks.com>2016-09-22 13:05:41 -0700
commit85d609cf25c1da2df3cd4f5d5aeaf3cbcf0d674c (patch)
treedaa705aed500801fbf0d3487a67ef951bbbdf23d /streaming
parent9f24a17c59b1130d97efa7d313c06577f7344338 (diff)
downloadspark-85d609cf25c1da2df3cd4f5d5aeaf3cbcf0d674c.tar.gz
spark-85d609cf25c1da2df3cd4f5d5aeaf3cbcf0d674c.tar.bz2
spark-85d609cf25c1da2df3cd4f5d5aeaf3cbcf0d674c.zip
[SPARK-17613] S3A base paths with no '/' at the end return empty DataFrames
## What changes were proposed in this pull request? Consider you have a bucket as `s3a://some-bucket` and under it you have files: ``` s3a://some-bucket/file1.parquet s3a://some-bucket/file2.parquet ``` Getting the parent path of `s3a://some-bucket/file1.parquet` yields `s3a://some-bucket/` and the ListingFileCatalog uses this as the key in the hash map. When catalog.allFiles is called, we use `s3a://some-bucket` (no slash at the end) to get the list of files, and we're left with an empty list! This PR fixes this by adding a `/` at the end of the `URI` iff the given `Path` doesn't have a parent, i.e. is the root. This is a no-op if the path already had a `/` at the end, and is handled through the Hadoop Path, path merging semantics. ## How was this patch tested? Unit test in `FileCatalogSuite`. Author: Burak Yavuz <brkyvz@gmail.com> Closes #15169 from brkyvz/SPARK-17613.
Diffstat (limited to 'streaming')
0 files changed, 0 insertions, 0 deletions