diff options
author | Liwei Lin <lwlin7@gmail.com> | 2016-05-16 12:59:55 -0700 |
---|---|---|
committer | Shixiong Zhu <shixiong@databricks.com> | 2016-05-16 12:59:55 -0700 |
commit | 95f4fbae52d26ede94c3ba8248394749f3d95dcc (patch) | |
tree | e8c2b24ae903ab2c185570d51791e33e757c18e1 /mllib-local/src | |
parent | fabc8e5b128849a08d820d8c0b3425e39258e02e (diff) | |
download | spark-95f4fbae52d26ede94c3ba8248394749f3d95dcc.tar.gz spark-95f4fbae52d26ede94c3ba8248394749f3d95dcc.tar.bz2 spark-95f4fbae52d26ede94c3ba8248394749f3d95dcc.zip |
[SPARK-14942][SQL][STREAMING] Reduce delay between batch construction and execution
## Problem
Currently in `StreamExecution`, [we first run the batch, then construct the next](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L165):
```scala
if (dataAvailable) runBatch()
constructNextBatch()
```
This is good when we run batches ASAP, where data would get processed in the **very next batch**:
![1](https://cloud.githubusercontent.com/assets/15843379/14779964/2786e698-0b0d-11e6-9d2c-bb41513488b2.png)
However, when we run batches at trigger like `ProcessTime("1 minute")`, data - such as _y_ below - may not get processed in the very next batch i.e. _batch 1_, but in _batch 2_:
![2](https://cloud.githubusercontent.com/assets/15843379/14779818/6f3bb064-0b0c-11e6-9f16-c1ce4897186b.png)
## What changes were proposed in this pull request?
This patch reverses the order of `constructNextBatch()` and `runBatch()`. After this patch, data would get processed in the **very next batch**, i.e. _batch 1_:
![3](https://cloud.githubusercontent.com/assets/15843379/14779816/6f36ee62-0b0c-11e6-9e53-bc8397fade18.png)
In addition, this patch alters when we do `currentBatchId += 1`: let's do that when the processing of the current batch's data is completed, so we won't bother passing `currentBatchId + 1` or `currentBatchId - 1` to states or sinks.
## How was this patch tested?
New added test case. Also this should be covered by existing test suits, e.g. stress tests and others.
Author: Liwei Lin <lwlin7@gmail.com>
Closes #12725 from lw-lin/construct-before-run-3.
Diffstat (limited to 'mllib-local/src')
0 files changed, 0 insertions, 0 deletions