aboutsummaryrefslogtreecommitdiff
path: root/sql/core/src/test/java
diff options
context:
space:
mode:
authorMichael Armbrust <michael@databricks.com>2016-03-22 10:18:42 -0700
committerMichael Armbrust <michael@databricks.com>2016-03-22 10:18:42 -0700
commitcaea15214571d9b12dcf1553e5c1cc8b83a8ba5b (patch)
treebc3e49ee19c98636bd249685011fe1ae3879cebc /sql/core/src/test/java
parentc632bdc01f51bb253fa3dc258ffa7fdecf814d35 (diff)
downloadspark-caea15214571d9b12dcf1553e5c1cc8b83a8ba5b.tar.gz
spark-caea15214571d9b12dcf1553e5c1cc8b83a8ba5b.tar.bz2
spark-caea15214571d9b12dcf1553e5c1cc8b83a8ba5b.zip
[SPARK-13985][SQL] Deterministic batches with ids
This PR relaxes the requirements of a `Sink` for structured streaming to only require idempotent appending of data. Previously the `Sink` needed to be able to transactionally append data while recording an opaque offset indicated how far in a stream we have processed. In order to do this, a new write-ahead-log has been added to stream execution, which records the offsets that will are present in each batch. The log is created in the newly added `checkpointLocation`, which defaults to `${spark.sql.streaming.checkpointLocation}/${queryName}` but can be overriden by setting `checkpointLocation` in `DataFrameWriter`. In addition to making sinks easier to write the addition of batchIds and a checkpoint location is done in anticipation of integration with the the `StateStore` (#11645). Author: Michael Armbrust <michael@databricks.com> Closes #11804 from marmbrus/batchIds.
Diffstat (limited to 'sql/core/src/test/java')
0 files changed, 0 insertions, 0 deletions