[SPARK-19843][SQL] UTF8String => (int / long) conversion expensive for invalid inputs - spark

diff options

author	Tejas Patil <tejasp@fb.com>	2017-03-07 20:19:30 -0800
committer	Wenchen Fan <wenchen@databricks.com>	2017-03-07 20:19:30 -0800
commit	c96d14abae5962a7b15239319c2a151b95f7db94 (patch)
tree	eefd1b2e220a4a7afc901e29418f3f5ee92f21d1 /sql/core/src/test
parent	47b2f68a885b7a2fc593ac7a55cd19742016364d (diff)
download	spark-c96d14abae5962a7b15239319c2a151b95f7db94.tar.gz spark-c96d14abae5962a7b15239319c2a151b95f7db94.tar.bz2 spark-c96d14abae5962a7b15239319c2a151b95f7db94.zip

[SPARK-19843][SQL] UTF8String => (int / long) conversion expensive for invalid inputs

## What changes were proposed in this pull request? Jira : https://issues.apache.org/jira/browse/SPARK-19843 Created wrapper classes (`IntWrapper`, `LongWrapper`) to wrap the result of parsing (which are primitive types). In case of problem in parsing, the method would return a boolean. ## How was this patch tested? - Added new unit tests - Ran a prod job which had conversion from string -> int and verified the outputs ## Performance Tiny regression when all strings are valid integers ``` conversion to int: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative -------------------------------------------------------------------------------- trunk 502 / 522 33.4 29.9 1.0X SPARK-19843 493 / 503 34.0 29.4 1.0X ``` Huge gain when all strings are invalid integers ``` conversion to int: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------- trunk 33913 / 34219 0.5 2021.4 1.0X SPARK-19843 154 / 162 108.8 9.2 220.0X ``` Author: Tejas Patil <tejasp@fb.com> Closes #17184 from tejasapatil/SPARK-19843_is_numeric_maybe.

Diffstat (limited to 'sql/core/src/test')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: