[SPARK-13953][SQL] Specifying the field name for corrupted record via option at JSON datasource

## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-13953 Currently, JSON data source creates a new field in `PERMISSIVE` mode for storing malformed string. This field can be renamed via `spark.sql.columnNameOfCorruptRecord` option but it is a global configuration. This PR make that option can be applied per read and can be specified via `option()`. This will overwrites `spark.sql.columnNameOfCorruptRecord` if it is set. ## How was this patch tested? Unit tests were used and `./dev/run_tests` for coding style tests. Author: hyukjinkwon <gurwls223@gmail.com> Closes #11881 from HyukjinKwon/SPARK-13953.
author: hyukjinkwon <gurwls223@gmail.com> 2016-03-22 20:30:48 +0800
committer: Wenchen Fan <wenchen@databricks.com> 2016-03-22 20:30:48 +0800
commit: 4e09a0d5ea50d1cfc936bc87cf3372b4a0aa7dc2 (patch)
tree: deb8c64a0a23977ad4a3bfd66794e904c817a104 /python/pyspark/sql
parent: f2e855fba8eb73475cf312cdf880c1297d4323bb (diff)
download: spark-4e09a0d5ea50d1cfc936bc87cf3372b4a0aa7dc2.tar.gz
spark-4e09a0d5ea50d1cfc936bc87cf3372b4a0aa7dc2.tar.bz2
spark-4e09a0d5ea50d1cfc936bc87cf3372b4a0aa7dc2.zip
1 files changed, 4 insertions, 1 deletions
diff --git a/python/pyspark/sql/readwriter.py b/python/pyspark/sql/readwriter.py
index bae9e69df8..cca57a385c 100644
--- a/python/pyspark/sql/readwriter.py
+++ b/python/pyspark/sql/readwriter.py
@@ -166,10 +166,13 @@ class DataFrameReader(object):
                 during parsing.
                 *  ``PERMISSIVE`` : sets other fields to ``null`` when it meets a corrupted \
                   record and puts the malformed string into a new field configured by \
-                 ``spark.sql.columnNameOfCorruptRecord``. When a schema is set by user, it sets \
+                 ``columnNameOfCorruptRecord``. When a schema is set by user, it sets \
                  ``null`` for extra fields.
                 *  ``DROPMALFORMED`` : ignores the whole corrupted records.
                 *  ``FAILFAST`` : throws an exception when it meets corrupted records.
+            *  ``columnNameOfCorruptRecord`` (default ``_corrupt_record``): allows renaming the \
+                 new field having malformed string created by ``PERMISSIVE`` mode. \
+                 This overrides ``spark.sql.columnNameOfCorruptRecord``.
 
         >>> df1 = sqlContext.read.json('python/test_support/sql/people.json')
         >>> df1.dtypes
author	hyukjinkwon <gurwls223@gmail.com>	2016-03-22 20:30:48 +0800
committer	Wenchen Fan <wenchen@databricks.com>	2016-03-22 20:30:48 +0800
commit	4e09a0d5ea50d1cfc936bc87cf3372b4a0aa7dc2 (patch)
tree	deb8c64a0a23977ad4a3bfd66794e904c817a104 /python/pyspark/sql
parent	f2e855fba8eb73475cf312cdf880c1297d4323bb (diff)
download	spark-4e09a0d5ea50d1cfc936bc87cf3372b4a0aa7dc2.tar.gz spark-4e09a0d5ea50d1cfc936bc87cf3372b4a0aa7dc2.tar.bz2 spark-4e09a0d5ea50d1cfc936bc87cf3372b4a0aa7dc2.zip