aboutsummaryrefslogtreecommitdiff
path: root/data
diff options
context:
space:
mode:
authorTyson Condie <tcondie@gmail.com>2016-11-18 11:11:24 -0800
committerShixiong Zhu <shixiong@databricks.com>2016-11-18 11:11:24 -0800
commit51baca2219fda8692b88fc8552548544aec73a1e (patch)
treed4ce1c82e0e1b589c01f9fafc4639d946d8229b8 /data
parentd9dd979d170f44383a9a87f892f2486ddb3cca7d (diff)
downloadspark-51baca2219fda8692b88fc8552548544aec73a1e.tar.gz
spark-51baca2219fda8692b88fc8552548544aec73a1e.tar.bz2
spark-51baca2219fda8692b88fc8552548544aec73a1e.zip
[SPARK-18187][SQL] CompactibleFileStreamLog should not use "compactInterval" direcly with user setting.
## What changes were proposed in this pull request? CompactibleFileStreamLog relys on "compactInterval" to detect a compaction batch. If the "compactInterval" is reset by user, CompactibleFileStreamLog will return wrong answer, resulting data loss. This PR procides a way to check the validity of 'compactInterval', and calculate an appropriate value. ## How was this patch tested? When restart a stream, we change the 'spark.sql.streaming.fileSource.log.compactInterval' different with the former one. The primary solution to this issue was given by uncleGen Added extensions include an additional metadata field in OffsetSeq and CompactibleFileStreamLog APIs. zsxwing Author: Tyson Condie <tcondie@gmail.com> Author: genmao.ygm <genmao.ygm@genmaoygmdeMacBook-Air.local> Closes #15852 from tcondie/spark-18187.
Diffstat (limited to 'data')
0 files changed, 0 insertions, 0 deletions