[SPARK-6052][SQL]In JSON schema inference, we should always set containsNull of an ArrayType to true - spark

diff options

author	Yin Huai <yhuai@databricks.com>	2015-03-02 23:18:07 +0800
committer	Cheng Lian <lian@databricks.com>	2015-03-02 23:18:07 +0800
commit	3efd8bb6cf139ce094ff631c7a9c1eb93fdcd566 (patch)
tree	bd29d3d61cc3355c4de6f5dfcd2fdefe4533e610 /examples
parent	39a54b40aff66816f8b8f5c6133eaaad6eaecae1 (diff)
download	spark-3efd8bb6cf139ce094ff631c7a9c1eb93fdcd566.tar.gz spark-3efd8bb6cf139ce094ff631c7a9c1eb93fdcd566.tar.bz2 spark-3efd8bb6cf139ce094ff631c7a9c1eb93fdcd566.zip

[SPARK-6052][SQL]In JSON schema inference, we should always set containsNull of an ArrayType to true

Always set `containsNull = true` when infer the schema of JSON datasets. If we set `containsNull` based on records we scanned, we may miss arrays with null values when we do sampling. Also, because future data can have arrays with null values, if we convert JSON data to parquet, always setting `containsNull = true` is a more robust way to go. JIRA: https://issues.apache.org/jira/browse/SPARK-6052 Author: Yin Huai <yhuai@databricks.com> Closes #4806 from yhuai/jsonArrayContainsNull and squashes the following commits: 05eab9d [Yin Huai] Change containsNull to true.

Diffstat (limited to 'examples')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: