[SPARK-18044][STREAMING] FileStreamSource should not infer partitions in every batch - spark

diff options

author	Wenchen Fan <wenchen@databricks.com>	2016-10-21 15:28:16 -0700
committer	Shixiong Zhu <shixiong@databricks.com>	2016-10-21 15:28:16 -0700
commit	140570252fd3739d6bdcadd6d4d5a180e480d3e0 (patch)
tree	35b5e05d85bfbf8f16fdc94d9e2102cbc5e256b9 /external
parent	c1f344f1a09b8834bec70c1ece30b9bff63e55ea (diff)
download	spark-140570252fd3739d6bdcadd6d4d5a180e480d3e0.tar.gz spark-140570252fd3739d6bdcadd6d4d5a180e480d3e0.tar.bz2 spark-140570252fd3739d6bdcadd6d4d5a180e480d3e0.zip

[SPARK-18044][STREAMING] FileStreamSource should not infer partitions in every batch

## What changes were proposed in this pull request? In `FileStreamSource.getBatch`, we will create a `DataSource` with specified schema, to avoid inferring the schema again and again. However, we don't pass the partition columns, and will infer the partition again and again. This PR fixes it by keeping the partition columns in `FileStreamSource`, like schema. ## How was this patch tested? N/A Author: Wenchen Fan <wenchen@databricks.com> Closes #15581 from cloud-fan/stream.

Diffstat (limited to 'external')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: