diff options
author | jerryshao <sshao@hortonworks.com> | 2016-09-20 10:24:12 -0700 |
---|---|---|
committer | Shixiong Zhu <shixiong@databricks.com> | 2016-09-20 10:24:12 -0700 |
commit | a6aade0042d9c065669f46d2dac40ec6ce361e63 (patch) | |
tree | 01d6a34c34b222a6d1e406aa8de821f696cfcc67 /common/network-yarn/src | |
parent | eb004c66200da7df36dd0a9a11999fc352197916 (diff) | |
download | spark-a6aade0042d9c065669f46d2dac40ec6ce361e63.tar.gz spark-a6aade0042d9c065669f46d2dac40ec6ce361e63.tar.bz2 spark-a6aade0042d9c065669f46d2dac40ec6ce361e63.zip |
[SPARK-15698][SQL][STREAMING] Add the ability to remove the old MetadataLog in FileStreamSource
## What changes were proposed in this pull request?
Current `metadataLog` in `FileStreamSource` will add a checkpoint file in each batch but do not have the ability to remove/compact, which will lead to large number of small files when running for a long time. So here propose to compact the old logs into one file. This method is quite similar to `FileStreamSinkLog` but simpler.
## How was this patch tested?
Unit test added.
Author: jerryshao <sshao@hortonworks.com>
Closes #13513 from jerryshao/SPARK-15698.
Diffstat (limited to 'common/network-yarn/src')
0 files changed, 0 insertions, 0 deletions