diff options
author | Yin Huai <yhuai@databricks.com> | 2015-12-16 23:18:53 -0800 |
---|---|---|
committer | Reynold Xin <rxin@databricks.com> | 2015-12-16 23:18:53 -0800 |
commit | 9d66c4216ad830812848c657bbcd8cd50949e199 (patch) | |
tree | 05e072cb1e4b77d67ec33fec6d30cf4b2a23e361 /core/src | |
parent | 437583f692e30b8dc03b339a34e92595d7b992ba (diff) | |
download | spark-9d66c4216ad830812848c657bbcd8cd50949e199.tar.gz spark-9d66c4216ad830812848c657bbcd8cd50949e199.tar.bz2 spark-9d66c4216ad830812848c657bbcd8cd50949e199.zip |
[SPARK-12057][SQL] Prevent failure on corrupt JSON records
This PR makes JSON parser and schema inference handle more cases where we have unparsed records. It is based on #10043. The last commit fixes the failed test and updates the logic of schema inference.
Regarding the schema inference change, if we have something like
```
{"f1":1}
[1,2,3]
```
originally, we will get a DF without any column.
After this change, we will get a DF with columns `f1` and `_corrupt_record`. Basically, for the second row, `[1,2,3]` will be the value of `_corrupt_record`.
When merge this PR, please make sure that the author is simplyianm.
JIRA: https://issues.apache.org/jira/browse/SPARK-12057
Closes #10043
Author: Ian Macalinao <me@ian.pw>
Author: Yin Huai <yhuai@databricks.com>
Closes #10288 from yhuai/handleCorruptJson.
Diffstat (limited to 'core/src')
0 files changed, 0 insertions, 0 deletions