[SPARK-13149][SQL] Add FileStreamSource - spark

diff options

author	Shixiong Zhu <shixiong@databricks.com>	2016-02-09 18:50:06 -0800
committer	Tathagata Das <tathagata.das1565@gmail.com>	2016-02-09 18:50:06 -0800
commit	b385ce38825de4b1420c5a0e8191e91fc8afecf5 (patch)
tree	ef988edcab7bdbf37082d07781b5addd9c3a364c /bin/load-spark-env.cmd
parent	6f710f9fd4f85370557b7705020ff16f2385e645 (diff)
download	spark-b385ce38825de4b1420c5a0e8191e91fc8afecf5.tar.gz spark-b385ce38825de4b1420c5a0e8191e91fc8afecf5.tar.bz2 spark-b385ce38825de4b1420c5a0e8191e91fc8afecf5.zip

[SPARK-13149][SQL] Add FileStreamSource

`FileStreamSource` is an implementation of `org.apache.spark.sql.execution.streaming.Source`. It takes advantage of the existing `HadoopFsRelationProvider` to support various file formats. It remembers files in each batch and stores it into the metadata files so as to recover them when restarting. The metadata files are stored in the file system. There will be a further PR to clean up the metadata files periodically. This is based on the initial work from marmbrus. Author: Shixiong Zhu <shixiong@databricks.com> Closes #11034 from zsxwing/stream-df-file-source.

Diffstat (limited to 'bin/load-spark-env.cmd')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: