diff options
author | Shixiong Zhu <shixiong@databricks.com> | 2016-02-09 18:50:06 -0800 |
---|---|---|
committer | Tathagata Das <tathagata.das1565@gmail.com> | 2016-02-09 18:50:06 -0800 |
commit | b385ce38825de4b1420c5a0e8191e91fc8afecf5 (patch) | |
tree | ef988edcab7bdbf37082d07781b5addd9c3a364c /bin/spark-shell2.cmd | |
parent | 6f710f9fd4f85370557b7705020ff16f2385e645 (diff) | |
download | spark-b385ce38825de4b1420c5a0e8191e91fc8afecf5.tar.gz spark-b385ce38825de4b1420c5a0e8191e91fc8afecf5.tar.bz2 spark-b385ce38825de4b1420c5a0e8191e91fc8afecf5.zip |
[SPARK-13149][SQL] Add FileStreamSource
`FileStreamSource` is an implementation of `org.apache.spark.sql.execution.streaming.Source`. It takes advantage of the existing `HadoopFsRelationProvider` to support various file formats. It remembers files in each batch and stores it into the metadata files so as to recover them when restarting. The metadata files are stored in the file system. There will be a further PR to clean up the metadata files periodically.
This is based on the initial work from marmbrus.
Author: Shixiong Zhu <shixiong@databricks.com>
Closes #11034 from zsxwing/stream-df-file-source.
Diffstat (limited to 'bin/spark-shell2.cmd')
0 files changed, 0 insertions, 0 deletions