diff options
author | Stephen De Gennaro <stepheng@realitymine.com> | 2015-10-26 19:55:10 -0700 |
---|---|---|
committer | Yin Huai <yhuai@databricks.com> | 2015-10-26 19:55:10 -0700 |
commit | 82464fb2e02ca4e4d425017815090497b79dc93f (patch) | |
tree | 38092b8f33e55405a41fcc00512e57b08f5fc0d8 /R/pkg/NAMESPACE | |
parent | d4c397a64af4cec899fdaa3e617ed20333cc567d (diff) | |
download | spark-82464fb2e02ca4e4d425017815090497b79dc93f.tar.gz spark-82464fb2e02ca4e4d425017815090497b79dc93f.tar.bz2 spark-82464fb2e02ca4e4d425017815090497b79dc93f.zip |
[SPARK-10947] [SQL] With schema inference from JSON into a Dataframe, add option to infer all primitive object types as strings
Currently, when a schema is inferred from a JSON file using sqlContext.read.json, the primitive object types are inferred as string, long, boolean, etc.
However, if the inferred type is too specific (JSON obviously does not enforce types itself), this can cause issues with merging dataframe schemas.
This pull request adds the option "primitivesAsString" to the JSON DataFrameReader which when true (defaults to false if not set) will infer all primitives as strings.
Below is an example usage of this new functionality.
```
val jsonDf = sqlContext.read.option("primitivesAsString", "true").json(sampleJsonFile)
scala> jsonDf.printSchema()
root
|-- bigInteger: string (nullable = true)
|-- boolean: string (nullable = true)
|-- double: string (nullable = true)
|-- integer: string (nullable = true)
|-- long: string (nullable = true)
|-- null: string (nullable = true)
|-- string: string (nullable = true)
```
Author: Stephen De Gennaro <stepheng@realitymine.com>
Closes #9249 from stephend-realitymine/stephend-primitives.
Diffstat (limited to 'R/pkg/NAMESPACE')
0 files changed, 0 insertions, 0 deletions