aboutsummaryrefslogtreecommitdiff
path: root/python
diff options
context:
space:
mode:
authorJurriaan Pruis <email@jurriaanpruis.nl>2016-05-25 12:40:16 -0700
committerReynold Xin <rxin@databricks.com>2016-05-25 12:40:16 -0700
commitc875d81a3de3f209b9eb03adf96b7c740b2c7b52 (patch)
treef09a2b335d592b1c40e42b2f557e8f643768dc3e /python
parent4b88067416ce922ae15a1445cf953fb9b5c43427 (diff)
downloadspark-c875d81a3de3f209b9eb03adf96b7c740b2c7b52.tar.gz
spark-c875d81a3de3f209b9eb03adf96b7c740b2c7b52.tar.bz2
spark-c875d81a3de3f209b9eb03adf96b7c740b2c7b52.zip
[SPARK-15493][SQL] default QuoteEscapingEnabled flag to true when writing CSV
## What changes were proposed in this pull request? Default QuoteEscapingEnabled flag to true when writing CSV and add an escapeQuotes option to be able to change this. See https://github.com/uniVocity/univocity-parsers/blob/f3eb2af26374940e60d91d1703bde54619f50c51/src/main/java/com/univocity/parsers/csv/CsvWriterSettings.java#L231-L247 This change is needed to be able to write RFC 4180 compatible CSV files (https://tools.ietf.org/html/rfc4180#section-2) https://issues.apache.org/jira/browse/SPARK-15493 ## How was this patch tested? Added a test that verifies the output is quoted correctly. Author: Jurriaan Pruis <email@jurriaanpruis.nl> Closes #13267 from jurriaan/quote-escaping.
Diffstat (limited to 'python')
-rw-r--r--python/pyspark/sql/readwriter.py7
1 files changed, 6 insertions, 1 deletions
diff --git a/python/pyspark/sql/readwriter.py b/python/pyspark/sql/readwriter.py
index 6f788cf50c..73d2b81b6b 100644
--- a/python/pyspark/sql/readwriter.py
+++ b/python/pyspark/sql/readwriter.py
@@ -769,7 +769,7 @@ class DataFrameWriter(object):
@since(2.0)
def csv(self, path, mode=None, compression=None, sep=None, quote=None, escape=None,
- header=None, nullValue=None):
+ header=None, nullValue=None, escapeQuotes=None):
"""Saves the content of the [[DataFrame]] in CSV format at the specified path.
:param path: the path in any Hadoop supported file system
@@ -790,6 +790,9 @@ class DataFrameWriter(object):
value, ``"``.
:param escape: sets the single character used for escaping quotes inside an already
quoted value. If None is set, it uses the default value, ``\``
+ :param escapeQuotes: A flag indicating whether values containing quotes should always
+ be enclosed in quotes. If None is set, it uses the default value
+ ``true``, escaping all values containing a quote character.
:param header: writes the names of columns as the first line. If None is set, it uses
the default value, ``false``.
:param nullValue: sets the string representation of a null value. If None is set, it uses
@@ -810,6 +813,8 @@ class DataFrameWriter(object):
self.option("header", header)
if nullValue is not None:
self.option("nullValue", nullValue)
+ if escapeQuotes is not None:
+ self.option("escapeQuotes", nullValue)
self._jwrite.csv(path)
@since(1.5)