[SPARK-10820][SQL] Support for the continuous execution of structured queries - spark

diff options

author	Michael Armbrust <michael@databricks.com>	2016-02-02 10:13:54 -0800
committer	Michael Armbrust <michael@databricks.com>	2016-02-02 10:13:54 -0800
commit	12a20c144f14e80ef120ddcfb0b455a805a2da23 (patch)
tree	9debd487706ae360a4b4324e631a53cf8ab16ff5 /README.md
parent	22ba21348b28d8b1909ccde6fe17fb9e68531e5a (diff)
download	spark-12a20c144f14e80ef120ddcfb0b455a805a2da23.tar.gz spark-12a20c144f14e80ef120ddcfb0b455a805a2da23.tar.bz2 spark-12a20c144f14e80ef120ddcfb0b455a805a2da23.zip

[SPARK-10820][SQL] Support for the continuous execution of structured queries

This is a follow up to 9aadcffabd226557174f3ff566927f873c71672e that extends Spark SQL to allow users to _repeatedly_ optimize and execute structured queries. A `ContinuousQuery` can be expressed using SQL, DataFrames or Datasets. The purpose of this PR is only to add some initial infrastructure which will be extended in subsequent PRs. ## User-facing API - `sqlContext.streamFrom` and `df.streamTo` return builder objects that are analogous to the `read/write` interfaces already available to executing queries in a batch-oriented fashion. - `ContinuousQuery` provides an interface for interacting with a query that is currently executing in the background. ## Internal Interfaces - `StreamExecution` - executes streaming queries in micro-batches The following are currently internal, but public APIs will be provided in a future release. - `Source` - an interface for providers of continually arriving data. A source must have a notion of an `Offset` that monotonically tracks what data has arrived. For fault tolerance, a source must be able to replay data given a start offset. - `Sink` - an interface that accepts the results of a continuously executing query. Also responsible for tracking the offset that should be resumed from in the case of a failure. ## Testing - `MemoryStream` and `MemorySink` - simple implementations of source and sink that keep all data in memory and have methods for simulating durability failures - `StreamTest` - a framework for performing actions and checking invariants on a continuous query Author: Michael Armbrust <michael@databricks.com> Author: Tathagata Das <tathagata.das1565@gmail.com> Author: Josh Rosen <rosenville@gmail.com> Closes #11006 from marmbrus/structured-streaming.

Diffstat (limited to 'README.md')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: