diff options
author | Dongjoon Hyun <dongjoon@apache.org> | 2017-01-03 23:06:50 +0800 |
---|---|---|
committer | Wenchen Fan <wenchen@databricks.com> | 2017-01-03 23:06:50 +0800 |
commit | 7a2b5f93bc3d3224470837ed3323964ba7cb1dca (patch) | |
tree | 12853f95a3e2bfa2887a1b5bb8b62326521f6d75 /resource-managers | |
parent | 52636226dc8cb7fcf00381d65e280d651b25a382 (diff) | |
download | spark-7a2b5f93bc3d3224470837ed3323964ba7cb1dca.tar.gz spark-7a2b5f93bc3d3224470837ed3323964ba7cb1dca.tar.bz2 spark-7a2b5f93bc3d3224470837ed3323964ba7cb1dca.zip |
[SPARK-18877][SQL] `CSVInferSchema.inferField` on DecimalType should find a common type with `typeSoFar`
## What changes were proposed in this pull request?
CSV type inferencing causes `IllegalArgumentException` on decimal numbers with heterogeneous precisions and scales because the current logic uses the last decimal type in a **partition**. Specifically, `inferRowType`, the **seqOp** of **aggregate**, returns the last decimal type. This PR fixes it to use `findTightestCommonType`.
**decimal.csv**
```
9.03E+12
1.19E+11
```
**BEFORE**
```scala
scala> spark.read.format("csv").option("inferSchema", true).load("decimal.csv").printSchema
root
|-- _c0: decimal(3,-9) (nullable = true)
scala> spark.read.format("csv").option("inferSchema", true).load("decimal.csv").show
16/12/16 14:32:49 ERROR Executor: Exception in task 0.0 in stage 4.0 (TID 4)
java.lang.IllegalArgumentException: requirement failed: Decimal precision 4 exceeds max precision 3
```
**AFTER**
```scala
scala> spark.read.format("csv").option("inferSchema", true).load("decimal.csv").printSchema
root
|-- _c0: decimal(4,-9) (nullable = true)
scala> spark.read.format("csv").option("inferSchema", true).load("decimal.csv").show
+---------+
| _c0|
+---------+
|9.030E+12|
| 1.19E+11|
+---------+
```
## How was this patch tested?
Pass the newly add test case.
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes #16320 from dongjoon-hyun/SPARK-18877.
Diffstat (limited to 'resource-managers')
0 files changed, 0 insertions, 0 deletions