diff options
author | gatorsmile <gatorsmile@gmail.com> | 2016-06-13 19:31:40 -0700 |
---|---|---|
committer | Yin Huai <yhuai@databricks.com> | 2016-06-13 19:31:40 -0700 |
commit | 5827b65e28da168286c771c53a38620d79f5e74f (patch) | |
tree | 76cbbd8d418dfc2e539df6187f53527bce4f9054 /docs | |
parent | 7b9071eeaa62fd9a51d9e94cfd479224b8341517 (diff) | |
download | spark-5827b65e28da168286c771c53a38620d79f5e74f.tar.gz spark-5827b65e28da168286c771c53a38620d79f5e74f.tar.bz2 spark-5827b65e28da168286c771c53a38620d79f5e74f.zip |
[SPARK-15808][SQL] File Format Checking When Appending Data
#### What changes were proposed in this pull request?
**Issue:** Got wrong results or strange errors when append data to a table with mismatched file format.
_Example 1: PARQUET -> CSV_
```Scala
createDF(0, 9).write.format("parquet").saveAsTable("appendParquetToOrc")
createDF(10, 19).write.mode(SaveMode.Append).format("orc").saveAsTable("appendParquetToOrc")
```
Error we got:
```
Job aborted due to stage failure: Task 0 in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 (TID 2, localhost): java.lang.RuntimeException: file:/private/var/folders/4b/sgmfldk15js406vk7lw5llzw0000gn/T/warehouse-bc8fedf2-aa6a-4002-a18b-524c6ac859d4/appendorctoparquet/part-r-00000-c0e3f365-1d46-4df5-a82c-b47d7af9feb9.snappy.orc is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [79, 82, 67, 23]
```
_Example 2: Json -> CSV_
```Scala
createDF(0, 9).write.format("json").saveAsTable("appendJsonToCSV")
createDF(10, 19).write.mode(SaveMode.Append).format("parquet").saveAsTable("appendJsonToCSV")
```
No exception, but wrong results:
```
+----+----+
| c1| c2|
+----+----+
|null|null|
|null|null|
|null|null|
|null|null|
| 0|str0|
| 1|str1|
| 2|str2|
| 3|str3|
| 4|str4|
| 5|str5|
| 6|str6|
| 7|str7|
| 8|str8|
| 9|str9|
+----+----+
```
_Example 3: Json -> Text_
```Scala
createDF(0, 9).write.format("json").saveAsTable("appendJsonToText")
createDF(10, 19).write.mode(SaveMode.Append).format("text").saveAsTable("appendJsonToText")
```
Error we got:
```
Text data source supports only a single column, and you have 2 columns.
```
This PR is to issue an exception with appropriate error messages.
#### How was this patch tested?
Added test cases.
Author: gatorsmile <gatorsmile@gmail.com>
Closes #13546 from gatorsmile/fileFormatCheck.
Diffstat (limited to 'docs')
0 files changed, 0 insertions, 0 deletions