diff options
author | Dongjoon Hyun <dongjoon@apache.org> | 2016-08-16 10:01:30 -0700 |
---|---|---|
committer | Davies Liu <davies.liu@gmail.com> | 2016-08-16 10:01:30 -0700 |
commit | 12a89e55cbd630fa2986da984e066cd07d3bf1f7 (patch) | |
tree | 3bfcd749953b0e17b25374b971f3b44bf7dc175e /python/pyspark/sql/tests.py | |
parent | 6f0988b1293a5e5ee3620b2727ed969155d7ac0d (diff) | |
download | spark-12a89e55cbd630fa2986da984e066cd07d3bf1f7.tar.gz spark-12a89e55cbd630fa2986da984e066cd07d3bf1f7.tar.bz2 spark-12a89e55cbd630fa2986da984e066cd07d3bf1f7.zip |
[SPARK-17035] [SQL] [PYSPARK] Improve Timestamp not to lose precision for all cases
## What changes were proposed in this pull request?
`PySpark` loses `microsecond` precision for some corner cases during converting `Timestamp` into `Long`. For example, for the following `datetime.max` value should be converted a value whose last 6 digits are '999999'. This PR improves the logic not to lose precision for all cases.
**Corner case**
```python
>>> datetime.datetime.max
datetime.datetime(9999, 12, 31, 23, 59, 59, 999999)
```
**Before**
```python
>>> from datetime import datetime
>>> from pyspark.sql import Row
>>> from pyspark.sql.types import StructType, StructField, TimestampType
>>> schema = StructType([StructField("dt", TimestampType(), False)])
>>> [schema.toInternal(row) for row in [{"dt": datetime.max}]]
[(253402329600000000,)]
```
**After**
```python
>>> [schema.toInternal(row) for row in [{"dt": datetime.max}]]
[(253402329599999999,)]
```
## How was this patch tested?
Pass the Jenkins test with a new test case.
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes #14631 from dongjoon-hyun/SPARK-17035.
Diffstat (limited to 'python/pyspark/sql/tests.py')
-rw-r--r-- | python/pyspark/sql/tests.py | 5 |
1 files changed, 5 insertions, 0 deletions
diff --git a/python/pyspark/sql/tests.py b/python/pyspark/sql/tests.py index 520b09d9c6..fc41701b59 100644 --- a/python/pyspark/sql/tests.py +++ b/python/pyspark/sql/tests.py @@ -178,6 +178,11 @@ class DataTypeTests(unittest.TestCase): dt = DateType() self.assertEqual(dt.fromInternal(0), datetime.date(1970, 1, 1)) + # regression test for SPARK-17035 + def test_timestamp_microsecond(self): + tst = TimestampType() + self.assertEqual(tst.toInternal(datetime.datetime.max) % 1000000, 999999) + def test_empty_row(self): row = Row() self.assertEqual(len(row), 0) |