diff options
author | Takeshi YAMAMURO <linguin.m.s@gmail.com> | 2016-12-10 05:32:04 +0800 |
---|---|---|
committer | Sean Owen <sowen@cloudera.com> | 2016-12-10 05:32:04 +0800 |
commit | b08b5004563b28d10b07b70946a9f72408ed228a (patch) | |
tree | 74f57e536ce48227fd239f915d1e8479a5b00fe9 /.travis.yml | |
parent | be5fc6ef72c7eb586b184b0f42ac50ef32843208 (diff) | |
download | spark-b08b5004563b28d10b07b70946a9f72408ed228a.tar.gz spark-b08b5004563b28d10b07b70946a9f72408ed228a.tar.bz2 spark-b08b5004563b28d10b07b70946a9f72408ed228a.zip |
[SPARK-18620][STREAMING][KINESIS] Flatten input rates in timeline for streaming + kinesis
## What changes were proposed in this pull request?
This pr is to make input rates in timeline more flat for spark streaming + kinesis.
Since kinesis workers fetch records and push them into block generators in bulk, timeline in web UI has many spikes when `maxRates` applied (See a Figure.1 below). This fix splits fetched input records into multiple `adRecords` calls.
Figure.1 Apply `maxRates=500` in vanilla Spark
<img width="1084" alt="apply_limit in_vanilla_spark" src="https://cloud.githubusercontent.com/assets/692303/20823861/4602f300-b89b-11e6-95f3-164a37061305.png">
Figure.2 Apply `maxRates=500` in Spark with my patch
<img width="1056" alt="apply_limit in_spark_with_my_patch" src="https://cloud.githubusercontent.com/assets/692303/20823882/6c46352c-b89b-11e6-81ab-afd8abfe0cfe.png">
## How was this patch tested?
Add tests to check to split input records into multiple `addRecords` calls.
Author: Takeshi YAMAMURO <linguin.m.s@gmail.com>
Closes #16114 from maropu/SPARK-18620.
Diffstat (limited to '.travis.yml')
0 files changed, 0 insertions, 0 deletions