aboutsummaryrefslogtreecommitdiff
path: root/sql/core/src/test/resources
diff options
context:
space:
mode:
authorBill Chambers <bill@databricks.com>2016-05-11 17:42:13 -0700
committerAndrew Or <andrew@databricks.com>2016-05-11 17:42:13 -0700
commit603f4453a16825cc5773cfe24d6ae4cee5ec949a (patch)
tree4213331a044ee4881c130a8bed4d96fe1825662b /sql/core/src/test/resources
parentf14c4ba001fbdbcc9faa46896f1f9d08a7d06609 (diff)
downloadspark-603f4453a16825cc5773cfe24d6ae4cee5ec949a.tar.gz
spark-603f4453a16825cc5773cfe24d6ae4cee5ec949a.tar.bz2
spark-603f4453a16825cc5773cfe24d6ae4cee5ec949a.zip
[SPARK-15264][SPARK-15274][SQL] CSV Reader Error on Blank Column Names
## What changes were proposed in this pull request? When a CSV begins with: - `,,` OR - `"","",` meaning that the first column names are either empty or blank strings and `header` is specified to be `true`, then the column name is replaced with `C` + the index number of that given column. For example, if you were to read in the CSV: ``` "","second column" "hello", "there" ``` Then column names would become `"C0", "second column"`. This behavior aligns with what currently happens when `header` is specified to be `false` in recent versions of Spark. ### Current Behavior in Spark <=1.6 In Spark <=1.6, a CSV with a blank column name becomes a blank string, `""`, meaning that this column cannot be accessed. However the CSV reads in without issue. ### Current Behavior in Spark 2.0 Spark throws a NullPointerError and will not read in the file. #### Reproduction in 2.0 https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/346304/2828750690305044/484361/latest.html ## How was this patch tested? A new test was added to `CSVSuite` to account for this issue. We then have asserts that test for being able to select both the empty column names as well as the regular column names. Author: Bill Chambers <bill@databricks.com> Author: Bill Chambers <wchambers@ischool.berkeley.edu> Closes #13041 from anabranch/master.
Diffstat (limited to 'sql/core/src/test/resources')
-rw-r--r--sql/core/src/test/resources/cars-blank-column-name.csv3
1 files changed, 3 insertions, 0 deletions
diff --git a/sql/core/src/test/resources/cars-blank-column-name.csv b/sql/core/src/test/resources/cars-blank-column-name.csv
new file mode 100644
index 0000000000..0b804b1614
--- /dev/null
+++ b/sql/core/src/test/resources/cars-blank-column-name.csv
@@ -0,0 +1,3 @@
+"",,make,customer,comment
+2012,"Tesla","S","bill","blank"
+2013,"Tesla","S","c","something"