diff options
author | Dongjoon Hyun <dongjoon@apache.org> | 2016-06-02 09:48:58 -0700 |
---|---|---|
committer | Wenchen Fan <wenchen@databricks.com> | 2016-06-02 09:48:58 -0700 |
commit | 63b7f127caf2fdf96eeb8457afd6c96bc8309a58 (patch) | |
tree | 117bc8d83080df4dec464edbfdbc9664ef158e25 /core/src | |
parent | 252417fa21eb47781addfd614ff00dac793b52a9 (diff) | |
download | spark-63b7f127caf2fdf96eeb8457afd6c96bc8309a58.tar.gz spark-63b7f127caf2fdf96eeb8457afd6c96bc8309a58.tar.bz2 spark-63b7f127caf2fdf96eeb8457afd6c96bc8309a58.zip |
[SPARK-15076][SQL] Add ReorderAssociativeOperator optimizer
## What changes were proposed in this pull request?
This issue add a new optimizer `ReorderAssociativeOperator` by taking advantage of integral associative property. Currently, Spark works like the following.
1) Can optimize `1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + a` into `45 + a`.
2) Cannot optimize `a + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9`.
This PR can handle Case 2 for **Add/Multiply** expression whose data types are `ByteType`, `ShortType`, `IntegerType`, and `LongType`. The followings are the plan comparison between `before` and `after` this issue.
**Before**
```scala
scala> sql("select a+1+2+3+4+5+6+7+8+9 from (select explode(array(1)) a)").explain
== Physical Plan ==
WholeStageCodegen
: +- Project [(((((((((a#7 + 1) + 2) + 3) + 4) + 5) + 6) + 7) + 8) + 9) AS (((((((((a + 1) + 2) + 3) + 4) + 5) + 6) + 7) + 8) + 9)#8]
: +- INPUT
+- Generate explode([1]), false, false, [a#7]
+- Scan OneRowRelation[]
scala> sql("select a*1*2*3*4*5*6*7*8*9 from (select explode(array(1)) a)").explain
== Physical Plan ==
*Project [(((((((((a#18 * 1) * 2) * 3) * 4) * 5) * 6) * 7) * 8) * 9) AS (((((((((a * 1) * 2) * 3) * 4) * 5) * 6) * 7) * 8) * 9)#19]
+- Generate explode([1]), false, false, [a#18]
+- Scan OneRowRelation[]
```
**After**
```scala
scala> sql("select a+1+2+3+4+5+6+7+8+9 from (select explode(array(1)) a)").explain
== Physical Plan ==
WholeStageCodegen
: +- Project [(a#7 + 45) AS (((((((((a + 1) + 2) + 3) + 4) + 5) + 6) + 7) + 8) + 9)#8]
: +- INPUT
+- Generate explode([1]), false, false, [a#7]
+- Scan OneRowRelation[]
scala> sql("select a*1*2*3*4*5*6*7*8*9 from (select explode(array(1)) a)").explain
== Physical Plan ==
*Project [(a#18 * 362880) AS (((((((((a * 1) * 2) * 3) * 4) * 5) * 6) * 7) * 8) * 9)#19]
+- Generate explode([1]), false, false, [a#18]
+- Scan OneRowRelation[]
```
This PR is greatly generalized by cloud-fan 's key ideas; he should be credited for the work he did.
## How was this patch tested?
Pass the Jenkins tests including new testsuite.
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes #12850 from dongjoon-hyun/SPARK-15076.
Diffstat (limited to 'core/src')
0 files changed, 0 insertions, 0 deletions