aboutsummaryrefslogtreecommitdiff
path: root/core
diff options
context:
space:
mode:
authorTathagata Das <tathagata.das1565@gmail.com>2016-03-23 12:48:05 -0700
committerTathagata Das <tathagata.das1565@gmail.com>2016-03-23 12:48:05 -0700
commit8c826880f5eaa3221c4e9e7d3fece54e821a0b98 (patch)
treeb6dbe3670844bac231b787ccd9a97d2797f0a181 /core
parent0a64294fcb4b64bfe095c63c3a494e0f40e22743 (diff)
downloadspark-8c826880f5eaa3221c4e9e7d3fece54e821a0b98.tar.gz
spark-8c826880f5eaa3221c4e9e7d3fece54e821a0b98.tar.bz2
spark-8c826880f5eaa3221c4e9e7d3fece54e821a0b98.zip
[SPARK-13809][SQL] State store for streaming aggregations
## What changes were proposed in this pull request? In this PR, I am implementing a new abstraction for management of streaming state data - State Store. It is a key-value store for persisting running aggregates for aggregate operations in streaming dataframes. The motivation and design is discussed here. https://docs.google.com/document/d/1-ncawFx8JS5Zyfq1HAEGBx56RDet9wfVp_hDM8ZL254/edit# ## How was this patch tested? - [x] Unit tests - [x] Cluster tests **Coverage from unit tests** <img width="952" alt="screen shot 2016-03-21 at 3 09 40 pm" src="https://cloud.githubusercontent.com/assets/663212/13935872/fdc8ba86-ef76-11e5-93e8-9fa310472c7b.png"> ## TODO - [x] Fix updates() iterator to avoid duplicate updates for same key - [x] Use Coordinator in ContinuousQueryManager - [x] Plugging in hadoop conf and other confs - [x] Unit tests - [x] StateStore object lifecycle and methods - [x] StateStoreCoordinator communication and logic - [x] StateStoreRDD fault-tolerance - [x] StateStoreRDD preferred location using StateStoreCoordinator - [ ] Cluster tests - [ ] Whether preferred locations are set correctly - [ ] Whether recovery works correctly with distributed storage - [x] Basic performance tests - [x] Docs Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #11645 from tdas/state-store.
Diffstat (limited to 'core')
0 files changed, 0 insertions, 0 deletions