[SPARK-19641][SQL] JSON schema inference in DROPMALFORMED mode produces incorrect schema for non-array/object JSONs - spark

diff options

author	hyukjinkwon <gurwls223@gmail.com>	2017-04-03 17:44:39 +0800
committer	Wenchen Fan <wenchen@databricks.com>	2017-04-03 17:44:39 +0800
commit	4fa1a43af6b5a6abaef7e04cacb2617a2e92d816 (patch)
tree	c07f8d0c8cdfa086f1734004fb8f11f9526c41d7 /sql/hive
parent	4d28e8430d11323f08657ca8f3251ca787c45501 (diff)
download	spark-4fa1a43af6b5a6abaef7e04cacb2617a2e92d816.tar.gz spark-4fa1a43af6b5a6abaef7e04cacb2617a2e92d816.tar.bz2 spark-4fa1a43af6b5a6abaef7e04cacb2617a2e92d816.zip

[SPARK-19641][SQL] JSON schema inference in DROPMALFORMED mode produces incorrect schema for non-array/object JSONs

## What changes were proposed in this pull request? Currently, when we infer the types for vaild JSON strings but object or array, we are producing empty schemas regardless of parse modes as below: ```scala scala> spark.read.option("mode", "DROPMALFORMED").json(Seq("""{"a": 1}""", """"a"""").toDS).printSchema() root ``` ```scala scala> spark.read.option("mode", "FAILFAST").json(Seq("""{"a": 1}""", """"a"""").toDS).printSchema() root ``` This PR proposes to handle parse modes in type inference. After this PR, ```scala scala> spark.read.option("mode", "DROPMALFORMED").json(Seq("""{"a": 1}""", """"a"""").toDS).printSchema() root |-- a: long (nullable = true) ``` ``` scala> spark.read.option("mode", "FAILFAST").json(Seq("""{"a": 1}""", """"a"""").toDS).printSchema() java.lang.RuntimeException: Failed to infer a common schema. Struct types are expected but string was found. ``` This PR is based on https://github.com/NathanHowell/spark/commit/e233fd03346a73b3b447fa4c24f3b12c8b2e53ae and I and NathanHowell talked about this in https://issues.apache.org/jira/browse/SPARK-19641 ## How was this patch tested? Unit tests in `JsonSuite` for both `DROPMALFORMED` and `FAILFAST` modes. Author: hyukjinkwon <gurwls223@gmail.com> Closes #17492 from HyukjinKwon/SPARK-19641.

Diffstat (limited to 'sql/hive')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: