diff options
author | Jurriaan Pruis <email@jurriaanpruis.nl> | 2016-05-25 12:40:16 -0700 |
---|---|---|
committer | Reynold Xin <rxin@databricks.com> | 2016-05-25 12:40:16 -0700 |
commit | c875d81a3de3f209b9eb03adf96b7c740b2c7b52 (patch) | |
tree | f09a2b335d592b1c40e42b2f557e8f643768dc3e /python | |
parent | 4b88067416ce922ae15a1445cf953fb9b5c43427 (diff) | |
download | spark-c875d81a3de3f209b9eb03adf96b7c740b2c7b52.tar.gz spark-c875d81a3de3f209b9eb03adf96b7c740b2c7b52.tar.bz2 spark-c875d81a3de3f209b9eb03adf96b7c740b2c7b52.zip |
[SPARK-15493][SQL] default QuoteEscapingEnabled flag to true when writing CSV
## What changes were proposed in this pull request?
Default QuoteEscapingEnabled flag to true when writing CSV and add an escapeQuotes option to be able to change this.
See https://github.com/uniVocity/univocity-parsers/blob/f3eb2af26374940e60d91d1703bde54619f50c51/src/main/java/com/univocity/parsers/csv/CsvWriterSettings.java#L231-L247
This change is needed to be able to write RFC 4180 compatible CSV files (https://tools.ietf.org/html/rfc4180#section-2)
https://issues.apache.org/jira/browse/SPARK-15493
## How was this patch tested?
Added a test that verifies the output is quoted correctly.
Author: Jurriaan Pruis <email@jurriaanpruis.nl>
Closes #13267 from jurriaan/quote-escaping.
Diffstat (limited to 'python')
-rw-r--r-- | python/pyspark/sql/readwriter.py | 7 |
1 files changed, 6 insertions, 1 deletions
diff --git a/python/pyspark/sql/readwriter.py b/python/pyspark/sql/readwriter.py index 6f788cf50c..73d2b81b6b 100644 --- a/python/pyspark/sql/readwriter.py +++ b/python/pyspark/sql/readwriter.py @@ -769,7 +769,7 @@ class DataFrameWriter(object): @since(2.0) def csv(self, path, mode=None, compression=None, sep=None, quote=None, escape=None, - header=None, nullValue=None): + header=None, nullValue=None, escapeQuotes=None): """Saves the content of the [[DataFrame]] in CSV format at the specified path. :param path: the path in any Hadoop supported file system @@ -790,6 +790,9 @@ class DataFrameWriter(object): value, ``"``. :param escape: sets the single character used for escaping quotes inside an already quoted value. If None is set, it uses the default value, ``\`` + :param escapeQuotes: A flag indicating whether values containing quotes should always + be enclosed in quotes. If None is set, it uses the default value + ``true``, escaping all values containing a quote character. :param header: writes the names of columns as the first line. If None is set, it uses the default value, ``false``. :param nullValue: sets the string representation of a null value. If None is set, it uses @@ -810,6 +813,8 @@ class DataFrameWriter(object): self.option("header", header) if nullValue is not None: self.option("nullValue", nullValue) + if escapeQuotes is not None: + self.option("escapeQuotes", nullValue) self._jwrite.csv(path) @since(1.5) |