[SPARK-17764][SQL] Add `to_json` supporting to convert nested struct column to JSON string

## What changes were proposed in this pull request? This PR proposes to add `to_json` function in contrast with `from_json` in Scala, Java and Python. It'd be useful if we can convert a same column from/to json. Also, some datasources do not support nested types. If we are forced to save a dataframe into those data sources, we might be able to work around by this function. The usage is as below: ``` scala val df = Seq(Tuple1(Tuple1(1))).toDF("a") df.select(to_json($"a").as("json")).show() ``` ``` bash +--------+ | json| +--------+ |{"_1":1}| +--------+ ``` ## How was this patch tested? Unit tests in `JsonFunctionsSuite` and `JsonExpressionsSuite`. Author: hyukjinkwon <gurwls223@gmail.com> Closes #15354 from HyukjinKwon/SPARK-17764.
author: hyukjinkwon <gurwls223@gmail.com> 2016-11-01 12:46:41 -0700
committer: Michael Armbrust <michael@databricks.com> 2016-11-01 12:46:41 -0700
commit: 01dd0083011741c2bbe5ae1d2a25f2c9a1302b76 (patch)
tree: 7b9993165b1a4f48e64d566d93c7883a3096403d /python/pyspark/sql/functions.py
parent: cfac17ee1cec414663b957228e469869eb7673c1 (diff)
download: spark-01dd0083011741c2bbe5ae1d2a25f2c9a1302b76.tar.gz
spark-01dd0083011741c2bbe5ae1d2a25f2c9a1302b76.tar.bz2
spark-01dd0083011741c2bbe5ae1d2a25f2c9a1302b76.zip
1 files changed, 23 insertions, 0 deletions
diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index 7fa3fd2de7..45e3c22bfc 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -1744,6 +1744,29 @@ def from_json(col, schema, options={}):
     return Column(jc)
 
 
+@ignore_unicode_prefix
+@since(2.1)
+def to_json(col, options={}):
+    """
+    Converts a column containing a [[StructType]] into a JSON string. Throws an exception,
+    in the case of an unsupported type.
+
+    :param col: name of column containing the struct
+    :param options: options to control converting. accepts the same options as the json datasource
+
+    >>> from pyspark.sql import Row
+    >>> from pyspark.sql.types import *
+    >>> data = [(1, Row(name='Alice', age=2))]
+    >>> df = spark.createDataFrame(data, ("key", "value"))
+    >>> df.select(to_json(df.value).alias("json")).collect()
+    [Row(json=u'{"age":2,"name":"Alice"}')]
+    """
+
+    sc = SparkContext._active_spark_context
+    jc = sc._jvm.functions.to_json(_to_java_column(col), options)
+    return Column(jc)
+
+
 @since(1.5)
 def size(col):
     """
author	hyukjinkwon <gurwls223@gmail.com>	2016-11-01 12:46:41 -0700
committer	Michael Armbrust <michael@databricks.com>	2016-11-01 12:46:41 -0700
commit	01dd0083011741c2bbe5ae1d2a25f2c9a1302b76 (patch)
tree	7b9993165b1a4f48e64d566d93c7883a3096403d /python/pyspark/sql/functions.py
parent	cfac17ee1cec414663b957228e469869eb7673c1 (diff)
download	spark-01dd0083011741c2bbe5ae1d2a25f2c9a1302b76.tar.gz spark-01dd0083011741c2bbe5ae1d2a25f2c9a1302b76.tar.bz2 spark-01dd0083011741c2bbe5ae1d2a25f2c9a1302b76.zip