diff options
author | Dongjoon Hyun <dongjoon@apache.org> | 2016-07-01 22:13:56 +0800 |
---|---|---|
committer | Cheng Lian <lian@databricks.com> | 2016-07-01 22:13:56 +0800 |
commit | c55397652ad1c6d047a8b8eb7fd92a8a1dc66306 (patch) | |
tree | d6c60d3a2b4703ca9f0440b556fae2a6e24e6365 /R/pkg/inst | |
parent | 0ad6ce7e54b1d8f5946dde652fa5341d15059158 (diff) | |
download | spark-c55397652ad1c6d047a8b8eb7fd92a8a1dc66306.tar.gz spark-c55397652ad1c6d047a8b8eb7fd92a8a1dc66306.tar.bz2 spark-c55397652ad1c6d047a8b8eb7fd92a8a1dc66306.zip |
[SPARK-16208][SQL] Add `PropagateEmptyRelation` optimizer
## What changes were proposed in this pull request?
This PR adds a new logical optimizer, `PropagateEmptyRelation`, to collapse a logical plans consisting of only empty LocalRelations.
**Optimizer Targets**
1. Binary(or Higher)-node Logical Plans
- Union with all empty children.
- Join with one or two empty children (including Intersect/Except).
2. Unary-node Logical Plans
- Project/Filter/Sample/Join/Limit/Repartition with all empty children.
- Aggregate with all empty children and without AggregateFunction expressions, COUNT.
- Generate with Explode because other UserDefinedGenerators like Hive UDTF returns results.
**Sample Query**
```sql
WITH t1 AS (SELECT a FROM VALUES 1 t(a)),
t2 AS (SELECT b FROM VALUES 1 t(b) WHERE 1=2)
SELECT a,b
FROM t1, t2
WHERE a=b
GROUP BY a,b
HAVING a>1
ORDER BY a,b
```
**Before**
```scala
scala> sql("with t1 as (select a from values 1 t(a)), t2 as (select b from values 1 t(b) where 1=2) select a,b from t1, t2 where a=b group by a,b having a>1 order by a,b").explain
== Physical Plan ==
*Sort [a#0 ASC, b#1 ASC], true, 0
+- Exchange rangepartitioning(a#0 ASC, b#1 ASC, 200)
+- *HashAggregate(keys=[a#0, b#1], functions=[])
+- Exchange hashpartitioning(a#0, b#1, 200)
+- *HashAggregate(keys=[a#0, b#1], functions=[])
+- *BroadcastHashJoin [a#0], [b#1], Inner, BuildRight
:- *Filter (isnotnull(a#0) && (a#0 > 1))
: +- LocalTableScan [a#0]
+- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)))
+- *Filter (isnotnull(b#1) && (b#1 > 1))
+- LocalTableScan <empty>, [b#1]
```
**After**
```scala
scala> sql("with t1 as (select a from values 1 t(a)), t2 as (select b from values 1 t(b) where 1=2) select a,b from t1, t2 where a=b group by a,b having a>1 order by a,b").explain
== Physical Plan ==
LocalTableScan <empty>, [a#0, b#1]
```
## How was this patch tested?
Pass the Jenkins tests (including a new testsuite).
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes #13906 from dongjoon-hyun/SPARK-16208.
Diffstat (limited to 'R/pkg/inst')
0 files changed, 0 insertions, 0 deletions