aboutsummaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
authorMichael Armbrust <michael@databricks.com>2016-02-02 10:13:54 -0800
committerMichael Armbrust <michael@databricks.com>2016-02-02 10:13:54 -0800
commit12a20c144f14e80ef120ddcfb0b455a805a2da23 (patch)
tree9debd487706ae360a4b4324e631a53cf8ab16ff5 /README.md
parent22ba21348b28d8b1909ccde6fe17fb9e68531e5a (diff)
downloadspark-12a20c144f14e80ef120ddcfb0b455a805a2da23.tar.gz
spark-12a20c144f14e80ef120ddcfb0b455a805a2da23.tar.bz2
spark-12a20c144f14e80ef120ddcfb0b455a805a2da23.zip
[SPARK-10820][SQL] Support for the continuous execution of structured queries
This is a follow up to 9aadcffabd226557174f3ff566927f873c71672e that extends Spark SQL to allow users to _repeatedly_ optimize and execute structured queries. A `ContinuousQuery` can be expressed using SQL, DataFrames or Datasets. The purpose of this PR is only to add some initial infrastructure which will be extended in subsequent PRs. ## User-facing API - `sqlContext.streamFrom` and `df.streamTo` return builder objects that are analogous to the `read/write` interfaces already available to executing queries in a batch-oriented fashion. - `ContinuousQuery` provides an interface for interacting with a query that is currently executing in the background. ## Internal Interfaces - `StreamExecution` - executes streaming queries in micro-batches The following are currently internal, but public APIs will be provided in a future release. - `Source` - an interface for providers of continually arriving data. A source must have a notion of an `Offset` that monotonically tracks what data has arrived. For fault tolerance, a source must be able to replay data given a start offset. - `Sink` - an interface that accepts the results of a continuously executing query. Also responsible for tracking the offset that should be resumed from in the case of a failure. ## Testing - `MemoryStream` and `MemorySink` - simple implementations of source and sink that keep all data in memory and have methods for simulating durability failures - `StreamTest` - a framework for performing actions and checking invariants on a continuous query Author: Michael Armbrust <michael@databricks.com> Author: Tathagata Das <tathagata.das1565@gmail.com> Author: Josh Rosen <rosenville@gmail.com> Closes #11006 from marmbrus/structured-streaming.
Diffstat (limited to 'README.md')
0 files changed, 0 insertions, 0 deletions