aboutsummaryrefslogtreecommitdiff
path: root/sql/hive
diff options
context:
space:
mode:
authorMichael Armbrust <michael@databricks.com>2015-10-08 14:28:14 -0700
committerYin Huai <yhuai@databricks.com>2015-10-08 14:28:14 -0700
commit9e66a53c9955285a85c19f55c3ef62db2e1b868a (patch)
tree509ebb5f6eb271c210616603148b95e8748f24cf /sql/hive
parenta8226a9f14e81c0b6712a30f1a60276200faebac (diff)
downloadspark-9e66a53c9955285a85c19f55c3ef62db2e1b868a.tar.gz
spark-9e66a53c9955285a85c19f55c3ef62db2e1b868a.tar.bz2
spark-9e66a53c9955285a85c19f55c3ef62db2e1b868a.zip
[SPARK-10993] [SQL] Inital code generated encoder for product types
This PR is a first cut at code generating an encoder that takes a Scala `Product` type and converts it directly into the tungsten binary format. This is done through the addition of a new set of expression that can be used to invoke methods on raw JVM objects, extracting fields and converting the result into the required format. These can then be used directly in an `UnsafeProjection` allowing us to leverage the existing encoding logic. According to some simple benchmarks, this can significantly speed up conversion (~4x). However, replacing CatalystConverters is deferred to a later PR to keep this PR at a reasonable size. ```scala case class SomeInts(a: Int, b: Int, c: Int, d: Int, e: Int) val data = SomeInts(1, 2, 3, 4, 5) val encoder = ProductEncoder[SomeInts] val converter = CatalystTypeConverters.createToCatalystConverter(ScalaReflection.schemaFor[SomeInts].dataType) (1 to 5).foreach {iter => benchmark(s"converter $iter") { var i = 100000000 while (i > 0) { val res = converter(data).asInstanceOf[InternalRow] assert(res.getInt(0) == 1) assert(res.getInt(1) == 2) i -= 1 } } benchmark(s"encoder $iter") { var i = 100000000 while (i > 0) { val res = encoder.toRow(data) assert(res.getInt(0) == 1) assert(res.getInt(1) == 2) i -= 1 } } } ``` Results: ``` [info] converter 1: 7170ms [info] encoder 1: 1888ms [info] converter 2: 6763ms [info] encoder 2: 1824ms [info] converter 3: 6912ms [info] encoder 3: 1802ms [info] converter 4: 7131ms [info] encoder 4: 1798ms [info] converter 5: 7350ms [info] encoder 5: 1912ms ``` Author: Michael Armbrust <michael@databricks.com> Closes #9019 from marmbrus/productEncoder.
Diffstat (limited to 'sql/hive')
0 files changed, 0 insertions, 0 deletions