[SPARK-19849][SQL] Support ArrayType in to_json to produce JSON array

## What changes were proposed in this pull request? This PR proposes to support an array of struct type in `to_json` as below: ```scala import org.apache.spark.sql.functions._ val df = Seq(Tuple1(Tuple1(1) :: Nil)).toDF("a") df.select(to_json($"a").as("json")).show() ``` ``` +----------+ | json| +----------+ |[{"_1":1}]| +----------+ ``` Currently, it throws an exception as below (a newline manually inserted for readability): ``` org.apache.spark.sql.AnalysisException: cannot resolve 'structtojson(`array`)' due to data type mismatch: structtojson requires that the expression is a struct expression.;; ``` This allows the roundtrip with `from_json` as below: ```scala import org.apache.spark.sql.functions._ import org.apache.spark.sql.types._ val schema = ArrayType(StructType(StructField("a", IntegerType) :: Nil)) val df = Seq("""[{"a":1}, {"a":2}]""").toDF("json").select(from_json($"json", schema).as("array")) df.show() // Read back. df.select(to_json($"array").as("json")).show() ``` ``` +----------+ | array| +----------+ |[[1], [2]]| +----------+ +-----------------+ | json| +-----------------+ |[{"a":1},{"a":2}]| +-----------------+ ``` Also, this PR proposes to rename from `StructToJson` to `StructsToJson ` and `JsonToStruct` to `JsonToStructs`. ## How was this patch tested? Unit tests in `JsonFunctionsSuite` and `JsonExpressionsSuite` for Scala, doctest for Python and test in `test_sparkSQL.R` for R. Author: hyukjinkwon <gurwls223@gmail.com> Closes #17192 from HyukjinKwon/SPARK-19849.
author: hyukjinkwon <gurwls223@gmail.com> 2017-03-19 22:33:01 -0700
committer: Felix Cheung <felixcheung@apache.org> 2017-03-19 22:33:01 -0700
commit: 0cdcf9114527a2c359c25e46fd6556b3855bfb28 (patch)
tree: b315a01420500d41669e9436658626f8890b7143 /python/pyspark
parent: 990af630d0d569880edd9c7ce9932e10037a28ab (diff)
download: spark-0cdcf9114527a2c359c25e46fd6556b3855bfb28.tar.gz
spark-0cdcf9114527a2c359c25e46fd6556b3855bfb28.tar.bz2
spark-0cdcf9114527a2c359c25e46fd6556b3855bfb28.zip
1 files changed, 10 insertions, 5 deletions
diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index 376b86ea69..f9121e60f3 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -1774,10 +1774,11 @@ def json_tuple(col, *fields):
 def from_json(col, schema, options={}):
     """
     Parses a column containing a JSON string into a [[StructType]] or [[ArrayType]]
-    with the specified schema. Returns `null`, in the case of an unparseable string.
+    of [[StructType]]s with the specified schema. Returns `null`, in the case of an unparseable
+    string.
 
     :param col: string column in json format
-    :param schema: a StructType or ArrayType to use when parsing the json column
+    :param schema: a StructType or ArrayType of StructType to use when parsing the json column
     :param options: options to control parsing. accepts the same options as the json datasource
 
     >>> from pyspark.sql.types import *
@@ -1802,10 +1803,10 @@ def from_json(col, schema, options={}):
 @since(2.1)
 def to_json(col, options={}):
     """
-    Converts a column containing a [[StructType]] into a JSON string. Throws an exception,
-    in the case of an unsupported type.
+    Converts a column containing a [[StructType]] or [[ArrayType]] of [[StructType]]s into a
+    JSON string. Throws an exception, in the case of an unsupported type.
 
-    :param col: name of column containing the struct
+    :param col: name of column containing the struct or array of the structs
     :param options: options to control converting. accepts the same options as the json datasource
 
     >>> from pyspark.sql import Row
@@ -1814,6 +1815,10 @@ def to_json(col, options={}):
     >>> df = spark.createDataFrame(data, ("key", "value"))
     >>> df.select(to_json(df.value).alias("json")).collect()
     [Row(json=u'{"age":2,"name":"Alice"}')]
+    >>> data = [(1, [Row(name='Alice', age=2), Row(name='Bob', age=3)])]
+    >>> df = spark.createDataFrame(data, ("key", "value"))
+    >>> df.select(to_json(df.value).alias("json")).collect()
+    [Row(json=u'[{"age":2,"name":"Alice"},{"age":3,"name":"Bob"}]')]
     """
 
     sc = SparkContext._active_spark_context
author	hyukjinkwon <gurwls223@gmail.com>	2017-03-19 22:33:01 -0700
committer	Felix Cheung <felixcheung@apache.org>	2017-03-19 22:33:01 -0700
commit	0cdcf9114527a2c359c25e46fd6556b3855bfb28 (patch)
tree	b315a01420500d41669e9436658626f8890b7143 /python/pyspark
parent	990af630d0d569880edd9c7ce9932e10037a28ab (diff)
download	spark-0cdcf9114527a2c359c25e46fd6556b3855bfb28.tar.gz spark-0cdcf9114527a2c359c25e46fd6556b3855bfb28.tar.bz2 spark-0cdcf9114527a2c359c25e46fd6556b3855bfb28.zip