diff options
author | windpiger <songjun@outlook.com> | 2017-02-08 14:30:28 +0800 |
---|---|---|
committer | Wenchen Fan <wenchen@databricks.com> | 2017-02-08 14:30:28 +0800 |
commit | d60dde26f98164ae146da1b5f409f4eb7c3621aa (patch) | |
tree | 477654049b435a3aefd5bd1e8e0a997de47b6c23 /sql/core/src/test/scala/org | |
parent | 5a0569ce693c635c5fa12b2de33ed3643ce888e3 (diff) | |
download | spark-d60dde26f98164ae146da1b5f409f4eb7c3621aa.tar.gz spark-d60dde26f98164ae146da1b5f409f4eb7c3621aa.tar.bz2 spark-d60dde26f98164ae146da1b5f409f4eb7c3621aa.zip |
[SPARK-19488][SQL] fix csv infer schema when the field is Nan/Inf etc
## What changes were proposed in this pull request?
when csv infer schema, it does not use user defined csvoptions to parse the field, such as `inf`, `-inf` which are should be parsed to DoubleType
this pr add `options.nanValue`, `options.negativeInf`, `options.positiveIn` to check if the field is a DoubleType
## How was this patch tested?
unit test added
Author: windpiger <songjun@outlook.com>
Closes #16834 from windpiger/fixinferInfSchemaCsv.
Diffstat (limited to 'sql/core/src/test/scala/org')
-rw-r--r-- | sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchemaSuite.scala | 8 |
1 files changed, 8 insertions, 0 deletions
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchemaSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchemaSuite.scala index 8620bb9f65..d8c6c25504 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchemaSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchemaSuite.scala @@ -131,4 +131,12 @@ class CSVInferSchemaSuite extends SparkFunSuite { assert(CSVInferSchema.inferField(DecimalType(20, 0), "2015-12-01 00:00:00", options) == StringType) } + + test("DoubleType should be infered when user defined nan/inf are provided") { + val options = new CSVOptions(Map("nanValue" -> "nan", "negativeInf" -> "-inf", + "positiveInf" -> "inf")) + assert(CSVInferSchema.inferField(NullType, "nan", options) == DoubleType) + assert(CSVInferSchema.inferField(NullType, "inf", options) == DoubleType) + assert(CSVInferSchema.inferField(NullType, "-inf", options) == DoubleType) + } } |