diff options
author | hyukjinkwon <gurwls223@gmail.com> | 2016-08-24 22:16:20 +0200 |
---|---|---|
committer | Herman van Hovell <hvanhovell@databricks.com> | 2016-08-24 22:16:20 +0200 |
commit | 29952ed096fd2a0a19079933ff691671d6f00835 (patch) | |
tree | 88846c8853b674b098e0c85ee1c462a8b99fb93a /data/mllib/pic_data.txt | |
parent | 891ac2b914fb6f90a62c6fbc0a3960a89d1c1d92 (diff) | |
download | spark-29952ed096fd2a0a19079933ff691671d6f00835.tar.gz spark-29952ed096fd2a0a19079933ff691671d6f00835.tar.bz2 spark-29952ed096fd2a0a19079933ff691671d6f00835.zip |
[SPARK-16216][SQL] Read/write timestamps and dates in ISO 8601 and dateFormat/timestampFormat option for CSV and JSON
## What changes were proposed in this pull request?
### Default - ISO 8601
Currently, CSV datasource is writing `Timestamp` and `Date` as numeric form and JSON datasource is writing both as below:
- CSV
```
// TimestampType
1414459800000000
// DateType
16673
```
- Json
```
// TimestampType
1970-01-01 11:46:40.0
// DateType
1970-01-01
```
So, for CSV we can't read back what we write and for JSON it becomes ambiguous because the timezone is being missed.
So, this PR make both **write** `Timestamp` and `Date` in ISO 8601 formatted string (please refer the [ISO 8601 specification](https://www.w3.org/TR/NOTE-datetime)).
- For `Timestamp` it becomes as below: (`yyyy-MM-dd'T'HH:mm:ss.SSSZZ`)
```
1970-01-01T02:00:01.000-01:00
```
- For `Date` it becomes as below (`yyyy-MM-dd`)
```
1970-01-01
```
### Custom date format option - `dateFormat`
This PR also adds the support to write and read dates and timestamps in a formatted string as below:
- **DateType**
- With `dateFormat` option (e.g. `yyyy/MM/dd`)
```
+----------+
| date|
+----------+
|2015/08/26|
|2014/10/27|
|2016/01/28|
+----------+
```
### Custom date format option - `timestampFormat`
- **TimestampType**
- With `dateFormat` option (e.g. `dd/MM/yyyy HH:mm`)
```
+----------------+
| date|
+----------------+
|2015/08/26 18:00|
|2014/10/27 18:30|
|2016/01/28 20:00|
+----------------+
```
## How was this patch tested?
Unit tests were added in `CSVSuite` and `JsonSuite`. For JSON, existing tests cover the default cases.
Author: hyukjinkwon <gurwls223@gmail.com>
Closes #14279 from HyukjinKwon/SPARK-16216-json-csv.
Diffstat (limited to 'data/mllib/pic_data.txt')
0 files changed, 0 insertions, 0 deletions