diff options
author | Tejas Patil <tejasp@fb.com> | 2017-03-07 20:19:30 -0800 |
---|---|---|
committer | Wenchen Fan <wenchen@databricks.com> | 2017-03-07 20:19:30 -0800 |
commit | c96d14abae5962a7b15239319c2a151b95f7db94 (patch) | |
tree | eefd1b2e220a4a7afc901e29418f3f5ee92f21d1 /sql/core/src/test | |
parent | 47b2f68a885b7a2fc593ac7a55cd19742016364d (diff) | |
download | spark-c96d14abae5962a7b15239319c2a151b95f7db94.tar.gz spark-c96d14abae5962a7b15239319c2a151b95f7db94.tar.bz2 spark-c96d14abae5962a7b15239319c2a151b95f7db94.zip |
[SPARK-19843][SQL] UTF8String => (int / long) conversion expensive for invalid inputs
## What changes were proposed in this pull request?
Jira : https://issues.apache.org/jira/browse/SPARK-19843
Created wrapper classes (`IntWrapper`, `LongWrapper`) to wrap the result of parsing (which are primitive types). In case of problem in parsing, the method would return a boolean.
## How was this patch tested?
- Added new unit tests
- Ran a prod job which had conversion from string -> int and verified the outputs
## Performance
Tiny regression when all strings are valid integers
```
conversion to int: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
--------------------------------------------------------------------------------
trunk 502 / 522 33.4 29.9 1.0X
SPARK-19843 493 / 503 34.0 29.4 1.0X
```
Huge gain when all strings are invalid integers
```
conversion to int: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
-------------------------------------------------------------------------------
trunk 33913 / 34219 0.5 2021.4 1.0X
SPARK-19843 154 / 162 108.8 9.2 220.0X
```
Author: Tejas Patil <tejasp@fb.com>
Closes #17184 from tejasapatil/SPARK-19843_is_numeric_maybe.
Diffstat (limited to 'sql/core/src/test')
0 files changed, 0 insertions, 0 deletions