diff options
author | Tathagata Das <tathagata.das1565@gmail.com> | 2016-03-23 12:48:05 -0700 |
---|---|---|
committer | Tathagata Das <tathagata.das1565@gmail.com> | 2016-03-23 12:48:05 -0700 |
commit | 8c826880f5eaa3221c4e9e7d3fece54e821a0b98 (patch) | |
tree | b6dbe3670844bac231b787ccd9a97d2797f0a181 /docs/img | |
parent | 0a64294fcb4b64bfe095c63c3a494e0f40e22743 (diff) | |
download | spark-8c826880f5eaa3221c4e9e7d3fece54e821a0b98.tar.gz spark-8c826880f5eaa3221c4e9e7d3fece54e821a0b98.tar.bz2 spark-8c826880f5eaa3221c4e9e7d3fece54e821a0b98.zip |
[SPARK-13809][SQL] State store for streaming aggregations
## What changes were proposed in this pull request?
In this PR, I am implementing a new abstraction for management of streaming state data - State Store. It is a key-value store for persisting running aggregates for aggregate operations in streaming dataframes. The motivation and design is discussed here.
https://docs.google.com/document/d/1-ncawFx8JS5Zyfq1HAEGBx56RDet9wfVp_hDM8ZL254/edit#
## How was this patch tested?
- [x] Unit tests
- [x] Cluster tests
**Coverage from unit tests**
<img width="952" alt="screen shot 2016-03-21 at 3 09 40 pm" src="https://cloud.githubusercontent.com/assets/663212/13935872/fdc8ba86-ef76-11e5-93e8-9fa310472c7b.png">
## TODO
- [x] Fix updates() iterator to avoid duplicate updates for same key
- [x] Use Coordinator in ContinuousQueryManager
- [x] Plugging in hadoop conf and other confs
- [x] Unit tests
- [x] StateStore object lifecycle and methods
- [x] StateStoreCoordinator communication and logic
- [x] StateStoreRDD fault-tolerance
- [x] StateStoreRDD preferred location using StateStoreCoordinator
- [ ] Cluster tests
- [ ] Whether preferred locations are set correctly
- [ ] Whether recovery works correctly with distributed storage
- [x] Basic performance tests
- [x] Docs
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes #11645 from tdas/state-store.
Diffstat (limited to 'docs/img')
0 files changed, 0 insertions, 0 deletions