aboutsummaryrefslogtreecommitdiff
path: root/core
diff options
context:
space:
mode:
authorDavies Liu <davies@databricks.com>2016-03-07 20:09:08 -0800
committerDavies Liu <davies.liu@gmail.com>2016-03-07 20:09:08 -0800
commit25bba58d160d0d24e40db1ca595200a52db922ed (patch)
tree9822c6c2f20af2a2faa68fd6d7e1d65921b2b877 /core
parentda7bfac488b2a25c591986fe5f906b5c98dc34ea (diff)
downloadspark-25bba58d160d0d24e40db1ca595200a52db922ed.tar.gz
spark-25bba58d160d0d24e40db1ca595200a52db922ed.tar.bz2
spark-25bba58d160d0d24e40db1ca595200a52db922ed.zip
[SPARK-13404] [SQL] Create variables for input row when it's actually used
## What changes were proposed in this pull request? This PR change the way how we generate the code for the output variables passing from a plan to it's parent. Right now, they are generated before call consume() of it's parent. It's not efficient, if the parent is a Filter or Join, which could filter out most the rows, the time to access some of the columns that are not used by the Filter or Join are wasted. This PR try to improve this by defering the access of columns until they are actually used by a plan. After this PR, a plan does not need to generate code to evaluate the variables for output, just passing the ExprCode to its parent by `consume()`. In `parent.consumeChild()`, it will check the output from child and `usedInputs`, generate the code for those columns that is part of `usedInputs` before calling `doConsume()`. This PR also change the `if` from ``` if (cond) { xxx } ``` to ``` if (!cond) continue; xxx ``` The new one could help to reduce the nested indents for multiple levels of Filter and BroadcastHashJoin. It also added some comments for operators. ## How was the this patch tested? Unit tests. Manually ran TPCDS Q55, this PR improve the performance about 30% (scale=10, from 2.56s to 1.96s) Author: Davies Liu <davies@databricks.com> Closes #11274 from davies/gen_defer.
Diffstat (limited to 'core')
0 files changed, 0 insertions, 0 deletions