diff options
author | Bill Chambers <bill@databricks.com> | 2016-05-11 17:42:13 -0700 |
---|---|---|
committer | Andrew Or <andrew@databricks.com> | 2016-05-11 17:42:13 -0700 |
commit | 603f4453a16825cc5773cfe24d6ae4cee5ec949a (patch) | |
tree | 4213331a044ee4881c130a8bed4d96fe1825662b /python/pyspark/sql/session.py | |
parent | f14c4ba001fbdbcc9faa46896f1f9d08a7d06609 (diff) | |
download | spark-603f4453a16825cc5773cfe24d6ae4cee5ec949a.tar.gz spark-603f4453a16825cc5773cfe24d6ae4cee5ec949a.tar.bz2 spark-603f4453a16825cc5773cfe24d6ae4cee5ec949a.zip |
[SPARK-15264][SPARK-15274][SQL] CSV Reader Error on Blank Column Names
## What changes were proposed in this pull request?
When a CSV begins with:
- `,,`
OR
- `"","",`
meaning that the first column names are either empty or blank strings and `header` is specified to be `true`, then the column name is replaced with `C` + the index number of that given column. For example, if you were to read in the CSV:
```
"","second column"
"hello", "there"
```
Then column names would become `"C0", "second column"`.
This behavior aligns with what currently happens when `header` is specified to be `false` in recent versions of Spark.
### Current Behavior in Spark <=1.6
In Spark <=1.6, a CSV with a blank column name becomes a blank string, `""`, meaning that this column cannot be accessed. However the CSV reads in without issue.
### Current Behavior in Spark 2.0
Spark throws a NullPointerError and will not read in the file.
#### Reproduction in 2.0
https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/346304/2828750690305044/484361/latest.html
## How was this patch tested?
A new test was added to `CSVSuite` to account for this issue. We then have asserts that test for being able to select both the empty column names as well as the regular column names.
Author: Bill Chambers <bill@databricks.com>
Author: Bill Chambers <wchambers@ischool.berkeley.edu>
Closes #13041 from anabranch/master.
Diffstat (limited to 'python/pyspark/sql/session.py')
0 files changed, 0 insertions, 0 deletions