[SPARK-16351][SQL] Avoid per-record type dispatch in JSON when writing

## What changes were proposed in this pull request? Currently, `JacksonGenerator.apply` is doing type-based dispatch for each row to write appropriate values. It might not have to be done like this because the schema is already kept. So, appropriate writers can be created first according to the schema once, and then apply them to each row. This approach is similar with `CatalystWriteSupport`. This PR corrects `JacksonGenerator` so that it creates all writers for the schema once and then applies them to each row rather than type dispatching for every row. Benchmark was proceeded with the codes below: ```scala test("Benchmark for JSON writer") { val N = 500 << 8 val row = """{"struct":{"field1": true, "field2": 92233720368547758070}, "structWithArrayFields":{"field1":[4, 5, 6], "field2":["str1", "str2"]}, "arrayOfString":["str1", "str2"], "arrayOfInteger":[1, 2147483647, -2147483648], "arrayOfLong":[21474836470, 9223372036854775807, -9223372036854775808], "arrayOfBigInteger":[922337203685477580700, -922337203685477580800], "arrayOfDouble":[1.2, 1.7976931348623157E308, 4.9E-324, 2.2250738585072014E-308], "arrayOfBoolean":[true, false, true], "arrayOfNull":[null, null, null, null], "arrayOfStruct":[{"field1": true, "field2": "str1"}, {"field1": false}, {"field3": null}], "arrayOfArray1":[[1, 2, 3], ["str1", "str2"]], "arrayOfArray2":[[1, 2, 3], [1.1, 2.1, 3.1]] }""" val df = spark.sqlContext.read.json(spark.sparkContext.parallelize(List.fill(N)(row))) val benchmark = new Benchmark("JSON writer", N) benchmark.addCase("writing JSON file", 10) { _ => withTempPath { path => df.write.format("json").save(path.getCanonicalPath) } } benchmark.run() } ``` This produced the results below - **Before** ``` JSON writer: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ writing JSON file 1675 / 1767 0.1 13087.5 1.0X ``` - **After** ``` JSON writer: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ writing JSON file 1597 / 1686 0.1 12477.1 1.0X ``` In addition, I ran this benchmark 10 times for each and calculated the average elapsed time as below: | **Before** | **After**| |---------------|------------| |17478ms |16669ms | It seems roughly ~5% is improved. ## How was this patch tested? Existing tests should cover this. Author: hyukjinkwon <gurwls223@gmail.com> Closes #14028 from HyukjinKwon/SPARK-16351.
author: hyukjinkwon <gurwls223@gmail.com> 2016-07-18 09:49:14 -0700
committer: Yin Huai <yhuai@databricks.com> 2016-07-18 09:49:14 -0700
commit: 2877f1a5224c38c1fa0b85ef633ff935fae9dd83 (patch)
tree: d0422246a272108bbae88c6d46c42f932bc1a4c4 /sql/core/src/test
parent: 8ea3f4eaec65ee4277f9943063fcc9488d3fa924 (diff)
download: spark-2877f1a5224c38c1fa0b85ef633ff935fae9dd83.tar.gz
spark-2877f1a5224c38c1fa0b85ef633ff935fae9dd83.tar.bz2
spark-2877f1a5224c38c1fa0b85ef633ff935fae9dd83.zip
1 files changed, 0 insertions, 3 deletions
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
index 6c72019702..a09f61aba9 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
@@ -21,10 +21,7 @@ import java.io.{File, StringWriter}
 import java.nio.charset.StandardCharsets
 import java.sql.{Date, Timestamp}
 
-import scala.collection.JavaConverters._
-
 import com.fasterxml.jackson.core.JsonFactory
-import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.fs.{Path, PathFilter}
 import org.apache.hadoop.io.SequenceFile.CompressionType
 import org.apache.hadoop.io.compress.GzipCodec
author	hyukjinkwon <gurwls223@gmail.com>	2016-07-18 09:49:14 -0700
committer	Yin Huai <yhuai@databricks.com>	2016-07-18 09:49:14 -0700
commit	2877f1a5224c38c1fa0b85ef633ff935fae9dd83 (patch)
tree	d0422246a272108bbae88c6d46c42f932bc1a4c4 /sql/core/src/test
parent	8ea3f4eaec65ee4277f9943063fcc9488d3fa924 (diff)
download	spark-2877f1a5224c38c1fa0b85ef633ff935fae9dd83.tar.gz spark-2877f1a5224c38c1fa0b85ef633ff935fae9dd83.tar.bz2 spark-2877f1a5224c38c1fa0b85ef633ff935fae9dd83.zip