aboutsummaryrefslogtreecommitdiff
path: root/python
diff options
context:
space:
mode:
authorTathagata Das <tathagata.das1565@gmail.com>2016-06-20 14:52:28 -0700
committerShixiong Zhu <shixiong@databricks.com>2016-06-20 14:52:28 -0700
commitb99129cc452defc266f6d357f5baab5f4ff37a36 (patch)
treede4e6e356930aeacee94b541530be063d178707c /python
parent6df8e3886063a9d8c2e8499456ea9166245d5640 (diff)
downloadspark-b99129cc452defc266f6d357f5baab5f4ff37a36.tar.gz
spark-b99129cc452defc266f6d357f5baab5f4ff37a36.tar.bz2
spark-b99129cc452defc266f6d357f5baab5f4ff37a36.zip
[SPARK-15982][SPARK-16009][SPARK-16007][SQL] Harmonize the behavior of DataFrameReader.text/csv/json/parquet/orc
## What changes were proposed in this pull request? Issues with current reader behavior. - `text()` without args returns an empty DF with no columns -> inconsistent, its expected that text will always return a DF with `value` string field, - `textFile()` without args fails with exception because of the above reason, it expected the DF returned by `text()` to have a `value` field. - `orc()` does not have var args, inconsistent with others - `json(single-arg)` was removed, but that caused source compatibility issues - [SPARK-16009](https://issues.apache.org/jira/browse/SPARK-16009) - user specified schema was not respected when `text/csv/...` were used with no args - [SPARK-16007](https://issues.apache.org/jira/browse/SPARK-16007) The solution I am implementing is to do the following. - For each format, there will be a single argument method, and a vararg method. For json, parquet, csv, text, this means adding json(string), etc.. For orc, this means adding orc(varargs). - Remove the special handling of text(), csv(), etc. that returns empty dataframe with no fields. Rather pass on the empty sequence of paths to the datasource, and let each datasource handle it right. For e.g, text data source, should return empty DF with schema (value: string) - Deduped docs and fixed their formatting. ## How was this patch tested? Added new unit tests for Scala and Java tests Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #13727 from tdas/SPARK-15982.
Diffstat (limited to 'python')
0 files changed, 0 insertions, 0 deletions