diff options
author | Michael Armbrust <michael@databricks.com> | 2016-02-02 10:13:54 -0800 |
---|---|---|
committer | Michael Armbrust <michael@databricks.com> | 2016-02-02 10:13:54 -0800 |
commit | 12a20c144f14e80ef120ddcfb0b455a805a2da23 (patch) | |
tree | 9debd487706ae360a4b4324e631a53cf8ab16ff5 /README.md | |
parent | 22ba21348b28d8b1909ccde6fe17fb9e68531e5a (diff) | |
download | spark-12a20c144f14e80ef120ddcfb0b455a805a2da23.tar.gz spark-12a20c144f14e80ef120ddcfb0b455a805a2da23.tar.bz2 spark-12a20c144f14e80ef120ddcfb0b455a805a2da23.zip |
[SPARK-10820][SQL] Support for the continuous execution of structured queries
This is a follow up to 9aadcffabd226557174f3ff566927f873c71672e that extends Spark SQL to allow users to _repeatedly_ optimize and execute structured queries. A `ContinuousQuery` can be expressed using SQL, DataFrames or Datasets. The purpose of this PR is only to add some initial infrastructure which will be extended in subsequent PRs.
## User-facing API
- `sqlContext.streamFrom` and `df.streamTo` return builder objects that are analogous to the `read/write` interfaces already available to executing queries in a batch-oriented fashion.
- `ContinuousQuery` provides an interface for interacting with a query that is currently executing in the background.
## Internal Interfaces
- `StreamExecution` - executes streaming queries in micro-batches
The following are currently internal, but public APIs will be provided in a future release.
- `Source` - an interface for providers of continually arriving data. A source must have a notion of an `Offset` that monotonically tracks what data has arrived. For fault tolerance, a source must be able to replay data given a start offset.
- `Sink` - an interface that accepts the results of a continuously executing query. Also responsible for tracking the offset that should be resumed from in the case of a failure.
## Testing
- `MemoryStream` and `MemorySink` - simple implementations of source and sink that keep all data in memory and have methods for simulating durability failures
- `StreamTest` - a framework for performing actions and checking invariants on a continuous query
Author: Michael Armbrust <michael@databricks.com>
Author: Tathagata Das <tathagata.das1565@gmail.com>
Author: Josh Rosen <rosenville@gmail.com>
Closes #11006 from marmbrus/structured-streaming.
Diffstat (limited to 'README.md')
0 files changed, 0 insertions, 0 deletions