diff options
author | hyukjinkwon <gurwls223@gmail.com> | 2016-04-29 22:52:21 -0700 |
---|---|---|
committer | Reynold Xin <rxin@databricks.com> | 2016-04-29 22:52:21 -0700 |
commit | 4bac703eb9dcc286d6b89630cf433f95b63a4a1f (patch) | |
tree | 62fa102cdfcff3b96a5b77c4393cf8101f42b791 /sql/core/src/test/resources | |
parent | ac41fc648de584f08863313fbac0c5bb6fc6a65e (diff) | |
download | spark-4bac703eb9dcc286d6b89630cf433f95b63a4a1f.tar.gz spark-4bac703eb9dcc286d6b89630cf433f95b63a4a1f.tar.bz2 spark-4bac703eb9dcc286d6b89630cf433f95b63a4a1f.zip |
[SPARK-13667][SQL] Support for specifying custom date format for date and timestamp types at CSV datasource.
## What changes were proposed in this pull request?
This PR adds the support to specify custom date format for `DateType` and `TimestampType`.
For `TimestampType`, this uses the given format to infer schema and also to convert the values
For `DateType`, this uses the given format to convert the values.
If the `dateFormat` is not given, then it works with `DateTimeUtils.stringToTime()` for backwords compatibility.
When it's given, then it uses `SimpleDateFormat` for parsing data.
In addition, `IntegerType`, `DoubleType` and `LongType` have a higher priority than `TimestampType` in type inference. This means even if the given format is `yyyy` or `yyyy.MM`, it will be inferred as `IntegerType` or `DoubleType`. Since it is type inference, I think it is okay to give such precedences.
In addition, I renamed `csv.CSVInferSchema` to `csv.InferSchema` as JSON datasource has `json.InferSchema`. Although they have the same names, I did this because I thought the parent package name can still differentiate each. Accordingly, the suite name was also changed from `CSVInferSchemaSuite` to `InferSchemaSuite`.
## How was this patch tested?
unit tests are used and `./dev/run_tests` for coding style tests.
Author: hyukjinkwon <gurwls223@gmail.com>
Closes #11550 from HyukjinKwon/SPARK-13667.
Diffstat (limited to 'sql/core/src/test/resources')
-rw-r--r-- | sql/core/src/test/resources/dates.csv | 4 |
1 files changed, 4 insertions, 0 deletions
diff --git a/sql/core/src/test/resources/dates.csv b/sql/core/src/test/resources/dates.csv new file mode 100644 index 0000000000..9ee99c31b3 --- /dev/null +++ b/sql/core/src/test/resources/dates.csv @@ -0,0 +1,4 @@ +date +26/08/2015 18:00 +27/10/2014 18:30 +28/01/2016 20:00 |