diff options
author | Davies Liu <davies@databricks.com> | 2016-08-11 09:47:19 -0700 |
---|---|---|
committer | Davies Liu <davies.liu@gmail.com> | 2016-08-11 09:47:19 -0700 |
commit | 0f72e4f04b227b9cd5d7ae5958e09b1def49420a (patch) | |
tree | 1ec2e07daa2f57687aaac833219ae4029ac076de /sbin/stop-master.sh | |
parent | 4d496802f592dca96dada73b24afc93c668a7f26 (diff) | |
download | spark-0f72e4f04b227b9cd5d7ae5958e09b1def49420a.tar.gz spark-0f72e4f04b227b9cd5d7ae5958e09b1def49420a.tar.bz2 spark-0f72e4f04b227b9cd5d7ae5958e09b1def49420a.zip |
[SPARK-16958] [SQL] Reuse subqueries within the same query
## What changes were proposed in this pull request?
There could be multiple subqueries that generate same results, we could re-use the result instead of running it multiple times.
This PR also cleanup up how we run subqueries.
For SQL query
```sql
select id,(select avg(id) from t) from t where id > (select avg(id) from t)
```
The explain is
```
== Physical Plan ==
*Project [id#15L, Subquery subquery29 AS scalarsubquery()#35]
: +- Subquery subquery29
: +- *HashAggregate(keys=[], functions=[avg(id#15L)])
: +- Exchange SinglePartition
: +- *HashAggregate(keys=[], functions=[partial_avg(id#15L)])
: +- *Range (0, 1000, splits=4)
+- *Filter (cast(id#15L as double) > Subquery subquery29)
: +- Subquery subquery29
: +- *HashAggregate(keys=[], functions=[avg(id#15L)])
: +- Exchange SinglePartition
: +- *HashAggregate(keys=[], functions=[partial_avg(id#15L)])
: +- *Range (0, 1000, splits=4)
+- *Range (0, 1000, splits=4)
```
The visualized plan:
![reuse-subquery](https://cloud.githubusercontent.com/assets/40902/17573229/e578d93c-5f0d-11e6-8a3c-0150d81d3aed.png)
## How was this patch tested?
Existing tests.
Author: Davies Liu <davies@databricks.com>
Closes #14548 from davies/subq.
Diffstat (limited to 'sbin/stop-master.sh')
0 files changed, 0 insertions, 0 deletions