[SPARK-19633][SS] FileSource read from FileSink - spark

diff options

author	Liwei Lin <lwlin7@gmail.com>	2017-02-28 22:58:51 -0800
committer	Shixiong Zhu <shixiong@databricks.com>	2017-02-28 22:58:51 -0800
commit	4913c92c2fbfcc22b41afb8ce79687165392d7da (patch)
tree	3879e2eed39d386aaf67383b7f6abdb170e923f0 /mllib
parent	89cd3845b6edb165236a6498dcade033975ee276 (diff)
download	spark-4913c92c2fbfcc22b41afb8ce79687165392d7da.tar.gz spark-4913c92c2fbfcc22b41afb8ce79687165392d7da.tar.bz2 spark-4913c92c2fbfcc22b41afb8ce79687165392d7da.zip

[SPARK-19633][SS] FileSource read from FileSink

## What changes were proposed in this pull request? Right now file source always uses `InMemoryFileIndex` to scan files from a given path. But when reading the outputs from another streaming query, the file source should use `MetadataFileIndex` to list files from the sink log. This patch adds this support. ## `MetadataFileIndex` or `InMemoryFileIndex` ```scala spark .readStream .format(...) .load("/some/path") // for a non-glob path: // - use `MetadataFileIndex` when `/some/path/_spark_meta` exists // - fall back to `InMemoryFileIndex` otherwise ``` ```scala spark .readStream .format(...) .load("/some/path/*/*") // for a glob path: always use `InMemoryFileIndex` ``` ## How was this patch tested? two newly added tests Author: Liwei Lin <lwlin7@gmail.com> Closes #16987 from lw-lin/source-read-from-sink.

Diffstat (limited to 'mllib')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: