diff options
author | Roberto Agostino Vitillo <ra.vitillo@gmail.com> | 2017-02-28 10:49:07 -0800 |
---|---|---|
committer | Shixiong Zhu <shixiong@databricks.com> | 2017-02-28 10:49:07 -0800 |
commit | 9734a928a75d29ea202e9f309f92ca4637d35671 (patch) | |
tree | 6074e39c3c117374bb2c0006ba56d6d8d700e162 /python/pyspark | |
parent | 7c7fc30b4ae8e4ebd4ededf92240fed10481f2dd (diff) | |
download | spark-9734a928a75d29ea202e9f309f92ca4637d35671.tar.gz spark-9734a928a75d29ea202e9f309f92ca4637d35671.tar.bz2 spark-9734a928a75d29ea202e9f309f92ca4637d35671.zip |
[SPARK-19677][SS] Committing a delta file atop an existing one should not fail on HDFS
## What changes were proposed in this pull request?
HDFSBackedStateStoreProvider fails to rename files on HDFS but not on the local filesystem. According to the [implementation notes](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/filesystem.html) of `rename()`, the behavior of the local filesystem and HDFS varies:
> Destination exists and is a file
> Renaming a file atop an existing file is specified as failing, raising an exception.
> - Local FileSystem : the rename succeeds; the destination file is replaced by the source file.
> - HDFS : The rename fails, no exception is raised. Instead the method call simply returns false.
This patch ensures that `rename()` isn't called if the destination file already exists. It's still semantically correct because Structured Streaming requires that rerunning a batch should generate the same output.
## How was this patch tested?
This patch was tested by running `StateStoreSuite`.
Author: Roberto Agostino Vitillo <ra.vitillo@gmail.com>
Closes #17012 from vitillo/fix_rename.
Diffstat (limited to 'python/pyspark')
0 files changed, 0 insertions, 0 deletions