diff options
author | hyukjinkwon <gurwls223@gmail.com> | 2016-09-21 10:35:29 +0100 |
---|---|---|
committer | Sean Owen <sowen@cloudera.com> | 2016-09-21 10:35:29 +0100 |
commit | 25a020be99b6a540e4001e59e40d5d1c8aa53812 (patch) | |
tree | af8108bb755277af338209736be43fe81ff58e7b /docs/structured-streaming-programming-guide.md | |
parent | 57dc326bd00cf0a49da971e9c573c48ae28acaa2 (diff) | |
download | spark-25a020be99b6a540e4001e59e40d5d1c8aa53812.tar.gz spark-25a020be99b6a540e4001e59e40d5d1c8aa53812.tar.bz2 spark-25a020be99b6a540e4001e59e40d5d1c8aa53812.zip |
[SPARK-17583][SQL] Remove uesless rowSeparator variable and set auto-expanding buffer as default for maxCharsPerColumn option in CSV
## What changes were proposed in this pull request?
This PR includes the changes below:
1. Upgrade Univocity library from 2.1.1 to 2.2.1
This includes some performance improvement and also enabling auto-extending buffer in `maxCharsPerColumn` option in CSV. Please refer the [release notes](https://github.com/uniVocity/univocity-parsers/releases).
2. Remove useless `rowSeparator` variable existing in `CSVOptions`
We have this unused variable in [CSVOptions.scala#L127](https://github.com/apache/spark/blob/29952ed096fd2a0a19079933ff691671d6f00835/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala#L127) but it seems possibly causing confusion that it actually does not care of `\r\n`. For example, we have an issue open about this, [SPARK-17227](https://issues.apache.org/jira/browse/SPARK-17227), describing this variable.
This variable is virtually not being used because we rely on `LineRecordReader` in Hadoop which deals with only both `\n` and `\r\n`.
3. Set the default value of `maxCharsPerColumn` to auto-expending.
We are setting 1000000 for the length of each column. It'd be more sensible we allow auto-expending rather than fixed length by default.
To make sure, using `-1` is being described in the release note, [2.2.0](https://github.com/uniVocity/univocity-parsers/releases/tag/v2.2.0).
## How was this patch tested?
N/A
Author: hyukjinkwon <gurwls223@gmail.com>
Closes #15138 from HyukjinKwon/SPARK-17583.
Diffstat (limited to 'docs/structured-streaming-programming-guide.md')
0 files changed, 0 insertions, 0 deletions