aboutsummaryrefslogtreecommitdiff
path: root/data
diff options
context:
space:
mode:
authorLiwei Lin <lwlin7@gmail.com>2016-05-16 12:59:55 -0700
committerShixiong Zhu <shixiong@databricks.com>2016-05-16 12:59:55 -0700
commit95f4fbae52d26ede94c3ba8248394749f3d95dcc (patch)
treee8c2b24ae903ab2c185570d51791e33e757c18e1 /data
parentfabc8e5b128849a08d820d8c0b3425e39258e02e (diff)
downloadspark-95f4fbae52d26ede94c3ba8248394749f3d95dcc.tar.gz
spark-95f4fbae52d26ede94c3ba8248394749f3d95dcc.tar.bz2
spark-95f4fbae52d26ede94c3ba8248394749f3d95dcc.zip
[SPARK-14942][SQL][STREAMING] Reduce delay between batch construction and execution
## Problem Currently in `StreamExecution`, [we first run the batch, then construct the next](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L165): ```scala if (dataAvailable) runBatch() constructNextBatch() ``` This is good when we run batches ASAP, where data would get processed in the **very next batch**: ![1](https://cloud.githubusercontent.com/assets/15843379/14779964/2786e698-0b0d-11e6-9d2c-bb41513488b2.png) However, when we run batches at trigger like `ProcessTime("1 minute")`, data - such as _y_ below - may not get processed in the very next batch i.e. _batch 1_, but in _batch 2_: ![2](https://cloud.githubusercontent.com/assets/15843379/14779818/6f3bb064-0b0c-11e6-9f16-c1ce4897186b.png) ## What changes were proposed in this pull request? This patch reverses the order of `constructNextBatch()` and `runBatch()`. After this patch, data would get processed in the **very next batch**, i.e. _batch 1_: ![3](https://cloud.githubusercontent.com/assets/15843379/14779816/6f36ee62-0b0c-11e6-9e53-bc8397fade18.png) In addition, this patch alters when we do `currentBatchId += 1`: let's do that when the processing of the current batch's data is completed, so we won't bother passing `currentBatchId + 1` or `currentBatchId - 1` to states or sinks. ## How was this patch tested? New added test case. Also this should be covered by existing test suits, e.g. stress tests and others. Author: Liwei Lin <lwlin7@gmail.com> Closes #12725 from lw-lin/construct-before-run-3.
Diffstat (limited to 'data')
0 files changed, 0 insertions, 0 deletions