aboutsummaryrefslogtreecommitdiff
path: root/docs/img/structured-streaming-late-data.png
diff options
context:
space:
mode:
authorhyukjinkwon <gurwls223@gmail.com>2016-06-29 11:42:51 -0700
committerReynold Xin <rxin@databricks.com>2016-06-29 11:42:51 -0700
commitcb1b9d34f37a5574de43f61e7036c4b8b81defbf (patch)
tree4729d676c34ba492f804e1b79e44d132a66f60d3 /docs/img/structured-streaming-late-data.png
parent39f2eb1da34f26bf68c535c8e6b796d71a37a651 (diff)
downloadspark-cb1b9d34f37a5574de43f61e7036c4b8b81defbf.tar.gz
spark-cb1b9d34f37a5574de43f61e7036c4b8b81defbf.tar.bz2
spark-cb1b9d34f37a5574de43f61e7036c4b8b81defbf.zip
[SPARK-14480][SQL] Remove meaningless StringIteratorReader for CSV data source.
## What changes were proposed in this pull request? This PR removes meaningless `StringIteratorReader` for CSV data source. In `CSVParser.scala`, there is an `Reader` wrapping `Iterator` but there are two problems by this. Firstly, it was actually not faster than processing line by line with Iterator due to additional logics to wrap `Iterator` to `Reader`. Secondly, this brought a bit of complexity because it needs additional logics to allow every line to be read bytes by bytes. So, it was pretty difficult to figure out issues about parsing, (eg. SPARK-14103). A benchmark was performed manually and the results were below: - Original codes with Reader wrapping Iterator |End-to-end (ns) | Parse Time (ns) | |-----------------------|------------------------| |14116265034 |2008277960 | - New codes with Iterator |End-to-end (ns) | Parse Time (ns) | |-----------------------|------------------------| |13451699644 | 1549050564 | For the details for the environment, dataset and methods, please refer the JIRA ticket. ## How was this patch tested? Existing tests should cover this. Author: hyukjinkwon <gurwls223@gmail.com> Closes #13808 from HyukjinKwon/SPARK-14480-small.
Diffstat (limited to 'docs/img/structured-streaming-late-data.png')
0 files changed, 0 insertions, 0 deletions