[SPARK-16208][SQL] Add `PropagateEmptyRelation` optimizer - spark

diff options

author	Dongjoon Hyun <dongjoon@apache.org>	2016-07-01 22:13:56 +0800
committer	Cheng Lian <lian@databricks.com>	2016-07-01 22:13:56 +0800
commit	c55397652ad1c6d047a8b8eb7fd92a8a1dc66306 (patch)
tree	d6c60d3a2b4703ca9f0440b556fae2a6e24e6365 /R/pkg/inst
parent	0ad6ce7e54b1d8f5946dde652fa5341d15059158 (diff)
download	spark-c55397652ad1c6d047a8b8eb7fd92a8a1dc66306.tar.gz spark-c55397652ad1c6d047a8b8eb7fd92a8a1dc66306.tar.bz2 spark-c55397652ad1c6d047a8b8eb7fd92a8a1dc66306.zip

[SPARK-16208][SQL] Add `PropagateEmptyRelation` optimizer

## What changes were proposed in this pull request? This PR adds a new logical optimizer, `PropagateEmptyRelation`, to collapse a logical plans consisting of only empty LocalRelations. **Optimizer Targets** 1. Binary(or Higher)-node Logical Plans - Union with all empty children. - Join with one or two empty children (including Intersect/Except). 2. Unary-node Logical Plans - Project/Filter/Sample/Join/Limit/Repartition with all empty children. - Aggregate with all empty children and without AggregateFunction expressions, COUNT. - Generate with Explode because other UserDefinedGenerators like Hive UDTF returns results. **Sample Query** ```sql WITH t1 AS (SELECT a FROM VALUES 1 t(a)), t2 AS (SELECT b FROM VALUES 1 t(b) WHERE 1=2) SELECT a,b FROM t1, t2 WHERE a=b GROUP BY a,b HAVING a>1 ORDER BY a,b ``` **Before** ```scala scala> sql("with t1 as (select a from values 1 t(a)), t2 as (select b from values 1 t(b) where 1=2) select a,b from t1, t2 where a=b group by a,b having a>1 order by a,b").explain == Physical Plan == *Sort [a#0 ASC, b#1 ASC], true, 0 +- Exchange rangepartitioning(a#0 ASC, b#1 ASC, 200) +- *HashAggregate(keys=[a#0, b#1], functions=[]) +- Exchange hashpartitioning(a#0, b#1, 200) +- *HashAggregate(keys=[a#0, b#1], functions=[]) +- *BroadcastHashJoin [a#0], [b#1], Inner, BuildRight :- *Filter (isnotnull(a#0) && (a#0 > 1)) : +- LocalTableScan [a#0] +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint))) +- *Filter (isnotnull(b#1) && (b#1 > 1)) +- LocalTableScan <empty>, [b#1] ``` **After** ```scala scala> sql("with t1 as (select a from values 1 t(a)), t2 as (select b from values 1 t(b) where 1=2) select a,b from t1, t2 where a=b group by a,b having a>1 order by a,b").explain == Physical Plan == LocalTableScan <empty>, [a#0, b#1] ``` ## How was this patch tested? Pass the Jenkins tests (including a new testsuite). Author: Dongjoon Hyun <dongjoon@apache.org> Closes #13906 from dongjoon-hyun/SPARK-16208.

Diffstat (limited to 'R/pkg/inst')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: