diff options
author | Hiroshi Inoue <inouehrs@jp.ibm.com> | 2016-06-30 21:47:44 -0700 |
---|---|---|
committer | Reynold Xin <rxin@databricks.com> | 2016-06-30 21:47:44 -0700 |
commit | 14cf61e909598d9f6b9c3b920de7299e9bc828e0 (patch) | |
tree | 6cda21096caf50bfde8e4de43ae211f49ad14317 /external/kafka-0-10 | |
parent | aa6564f37f1d8de77c3b7bfa885000252efffea6 (diff) | |
download | spark-14cf61e909598d9f6b9c3b920de7299e9bc828e0.tar.gz spark-14cf61e909598d9f6b9c3b920de7299e9bc828e0.tar.bz2 spark-14cf61e909598d9f6b9c3b920de7299e9bc828e0.zip |
[SPARK-16331][SQL] Reduce code generation time
## What changes were proposed in this pull request?
During the code generation, a `LocalRelation` often has a huge `Vector` object as `data`. In the simple example below, a `LocalRelation` has a Vector with 1000000 elements of `UnsafeRow`.
```
val numRows = 1000000
val ds = (1 to numRows).toDS().persist()
benchmark.addCase("filter+reduce") { iter =>
ds.filter(a => (a & 1) == 0).reduce(_ + _)
}
```
At `TreeNode.transformChildren`, all elements of the vector is unnecessarily iterated to check whether any children exist in the vector since `Vector` is Traversable. This part significantly increases code generation time.
This patch avoids this overhead by checking the number of children before iterating all elements; `LocalRelation` does not have children since it extends `LeafNode`.
The performance of the above example
```
without this patch
Java HotSpot(TM) 64-Bit Server VM 1.8.0_91-b14 on Mac OS X 10.11.5
Intel(R) Core(TM) i5-5257U CPU 2.70GHz
compilationTime: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
filter+reduce 4426 / 4533 0.2 4426.0 1.0X
with this patch
compilationTime: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
filter+reduce 3117 / 3391 0.3 3116.6 1.0X
```
## How was this patch tested?
using existing unit tests
Author: Hiroshi Inoue <inouehrs@jp.ibm.com>
Closes #14000 from inouehrs/compilation-time-reduction.
Diffstat (limited to 'external/kafka-0-10')
0 files changed, 0 insertions, 0 deletions