aboutsummaryrefslogtreecommitdiff
path: root/common/sketch
diff options
context:
space:
mode:
authorgatorsmile <gatorsmile@gmail.com>2016-01-29 11:22:12 -0800
committerReynold Xin <rxin@databricks.com>2016-01-29 11:22:12 -0800
commit5f686cc8b74ea9e36f56c31f14df90d134fd9343 (patch)
tree282fbb236a8a20e5f2ba879c7adf44a2c182d129 /common/sketch
parentc5f745ede01831b59c57effa7de88c648b82c13d (diff)
downloadspark-5f686cc8b74ea9e36f56c31f14df90d134fd9343.tar.gz
spark-5f686cc8b74ea9e36f56c31f14df90d134fd9343.tar.bz2
spark-5f686cc8b74ea9e36f56c31f14df90d134fd9343.zip
[SPARK-12656] [SQL] Implement Intersect with Left-semi Join
Our current Intersect physical operator simply delegates to RDD.intersect. We should remove the Intersect physical operator and simply transform a logical intersect into a semi-join with distinct. This way, we can take advantage of all the benefits of join implementations (e.g. managed memory, code generation, broadcast joins). After a search, I found one of the mainstream RDBMS did the same. In their query explain, Intersect is replaced by Left-semi Join. Left-semi Join could help outer-join elimination in Optimizer, as shown in the PR: https://github.com/apache/spark/pull/10566 Author: gatorsmile <gatorsmile@gmail.com> Author: xiaoli <lixiao1983@gmail.com> Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local> Closes #10630 from gatorsmile/IntersectBySemiJoin.
Diffstat (limited to 'common/sketch')
0 files changed, 0 insertions, 0 deletions