aboutsummaryrefslogtreecommitdiff
path: root/sql/hive
diff options
context:
space:
mode:
authorfrreiss <frreiss@us.ibm.com>2016-10-26 17:33:08 -0700
committerShixiong Zhu <shixiong@databricks.com>2016-10-26 17:33:08 -0700
commit5b27598ff50cb08e7570fade458da0a3d4d4eabc (patch)
treecb1aa8d34585bf459168bd3e5a323637fe686877 /sql/hive
parenta76846cfb1c2d6c8f4d647426030b59de20d9433 (diff)
downloadspark-5b27598ff50cb08e7570fade458da0a3d4d4eabc.tar.gz
spark-5b27598ff50cb08e7570fade458da0a3d4d4eabc.tar.bz2
spark-5b27598ff50cb08e7570fade458da0a3d4d4eabc.zip
[SPARK-16963][STREAMING][SQL] Changes to Source trait and related implementation classes
## What changes were proposed in this pull request? This PR contains changes to the Source trait such that the scheduler can notify data sources when it is safe to discard buffered data. Summary of changes: * Added a method `commit(end: Offset)` that tells the Source that is OK to discard all offsets up `end`, inclusive. * Changed the semantics of a `None` value for the `getBatch` method to mean "from the very beginning of the stream"; as opposed to "all data present in the Source's buffer". * Added notes that the upper layers of the system will never call `getBatch` with a start value less than the last value passed to `commit`. * Added a `lastCommittedOffset` method to allow the scheduler to query the status of each Source on restart. This addition is not strictly necessary, but it seemed like a good idea -- Sources will be maintaining their own persistent state, and there may be bugs in the checkpointing code. * The scheduler in `StreamExecution.scala` now calls `commit` on its stream sources after marking each batch as complete in its checkpoint. * `MemoryStream` now cleans committed batches out of its internal buffer. * `TextSocketSource` now cleans committed batches from its internal buffer. ## How was this patch tested? Existing regression tests already exercise the new code. Author: frreiss <frreiss@us.ibm.com> Closes #14553 from frreiss/fred-16963.
Diffstat (limited to 'sql/hive')
0 files changed, 0 insertions, 0 deletions