diff options
author | Herman van Hovell <hvanhovell@questtec.nl> | 2016-04-06 19:25:10 -0700 |
---|---|---|
committer | Reynold Xin <rxin@databricks.com> | 2016-04-06 19:25:10 -0700 |
commit | d76592276f9f66fed8012d876595de8717f516a9 (patch) | |
tree | bb3570eac8b6885efe77677d18cda30df7cb0a69 /mllib/src/test/scala | |
parent | 4901086fea969a34ec312ef4a8f83d84e1bf21fb (diff) | |
download | spark-d76592276f9f66fed8012d876595de8717f516a9.tar.gz spark-d76592276f9f66fed8012d876595de8717f516a9.tar.bz2 spark-d76592276f9f66fed8012d876595de8717f516a9.zip |
[SPARK-12610][SQL] Left Anti Join
### What changes were proposed in this pull request?
This PR adds support for `LEFT ANTI JOIN` to Spark SQL. A `LEFT ANTI JOIN` is the exact opposite of a `LEFT SEMI JOIN` and can be used to identify rows in one dataset that are not in another dataset. Note that `nulls` on the left side of the join cannot match a row on the right hand side of the join; the result is that left anti join will always select a row with a `null` in one or more of its keys.
We currently add support for the following SQL join syntax:
SELECT *
FROM tbl1 A
LEFT ANTI JOIN tbl2 B
ON A.Id = B.Id
Or using a dataframe:
tbl1.as("a").join(tbl2.as("b"), $"a.id" === $"b.id", "left_anti)
This PR provides serves as the basis for implementing `NOT EXISTS` and `NOT IN (...)` correlated sub-queries. It would also serve as good basis for implementing an more efficient `EXCEPT` operator.
The PR has been (losely) based on PR's by both davies (https://github.com/apache/spark/pull/10706) and chenghao-intel (https://github.com/apache/spark/pull/10563); credit should be given where credit is due.
This PR adds supports for `LEFT ANTI JOIN` to `BroadcastHashJoin` (including codegeneration), `ShuffledHashJoin` and `BroadcastNestedLoopJoin`.
### How was this patch tested?
Added tests to `JoinSuite` and ported `ExistenceJoinSuite` from https://github.com/apache/spark/pull/10563.
cc davies chenghao-intel rxin
Author: Herman van Hovell <hvanhovell@questtec.nl>
Closes #12214 from hvanhovell/SPARK-12610.
Diffstat (limited to 'mllib/src/test/scala')
0 files changed, 0 insertions, 0 deletions