diff options
author | Liang-Chi Hsieh <viirya@gmail.com> | 2016-03-01 08:43:02 -0800 |
---|---|---|
committer | Davies Liu <davies.liu@gmail.com> | 2016-03-01 08:43:02 -0800 |
commit | c43899a04e4de18e238a1761bf4fe9f54e182320 (patch) | |
tree | 34c22b64f5034e14f7fe33b2975469ee0e09d2f5 /tags/README.md | |
parent | 12a2a57e1af21da0aa4275971365d76a8fc84a43 (diff) | |
download | spark-c43899a04e4de18e238a1761bf4fe9f54e182320.tar.gz spark-c43899a04e4de18e238a1761bf4fe9f54e182320.tar.bz2 spark-c43899a04e4de18e238a1761bf4fe9f54e182320.zip |
[SPARK-13511] [SQL] Add wholestage codegen for limit
JIRA: https://issues.apache.org/jira/browse/SPARK-13511
## What changes were proposed in this pull request?
Current limit operator doesn't support wholestage codegen. This is open to add support for it.
In the `doConsume` of `GlobalLimit` and `LocalLimit`, we use a count term to count the processed rows. Once the row numbers catches the limit number, we set the variable `stopEarly` of `BufferedRowIterator` newly added in this pr to `true` that indicates we want to stop processing remaining rows. Then when the wholestage codegen framework checks `shouldStop()`, it will stop the processing of the row iterator.
Before this, the executed plan for a query `sqlContext.range(N).limit(100).groupBy().sum()` is:
TungstenAggregate(key=[], functions=[(sum(id#5L),mode=Final,isDistinct=false)], output=[sum(id)#6L])
+- TungstenAggregate(key=[], functions=[(sum(id#5L),mode=Partial,isDistinct=false)], output=[sum#9L])
+- GlobalLimit 100
+- Exchange SinglePartition, None
+- LocalLimit 100
+- Range 0, 1, 1, 524288000, [id#5L]
After add wholestage codegen support:
WholeStageCodegen
: +- TungstenAggregate(key=[], functions=[(sum(id#40L),mode=Final,isDistinct=false)], output=[sum(id)#41L])
: +- TungstenAggregate(key=[], functions=[(sum(id#40L),mode=Partial,isDistinct=false)], output=[sum#44L])
: +- GlobalLimit 100
: +- INPUT
+- Exchange SinglePartition, None
+- WholeStageCodegen
: +- LocalLimit 100
: +- Range 0, 1, 1, 524288000, [id#40L]
## How was this patch tested?
A test is added into BenchmarkWholeStageCodegen.
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes #11391 from viirya/wholestage-limit.
Diffstat (limited to 'tags/README.md')
0 files changed, 0 insertions, 0 deletions