aboutsummaryrefslogtreecommitdiff
path: root/graphx
diff options
context:
space:
mode:
authorHerman van Hovell <hvanhovell@questtec.nl>2016-04-06 19:25:10 -0700
committerReynold Xin <rxin@databricks.com>2016-04-06 19:25:10 -0700
commitd76592276f9f66fed8012d876595de8717f516a9 (patch)
treebb3570eac8b6885efe77677d18cda30df7cb0a69 /graphx
parent4901086fea969a34ec312ef4a8f83d84e1bf21fb (diff)
downloadspark-d76592276f9f66fed8012d876595de8717f516a9.tar.gz
spark-d76592276f9f66fed8012d876595de8717f516a9.tar.bz2
spark-d76592276f9f66fed8012d876595de8717f516a9.zip
[SPARK-12610][SQL] Left Anti Join
### What changes were proposed in this pull request? This PR adds support for `LEFT ANTI JOIN` to Spark SQL. A `LEFT ANTI JOIN` is the exact opposite of a `LEFT SEMI JOIN` and can be used to identify rows in one dataset that are not in another dataset. Note that `nulls` on the left side of the join cannot match a row on the right hand side of the join; the result is that left anti join will always select a row with a `null` in one or more of its keys. We currently add support for the following SQL join syntax: SELECT * FROM tbl1 A LEFT ANTI JOIN tbl2 B ON A.Id = B.Id Or using a dataframe: tbl1.as("a").join(tbl2.as("b"), $"a.id" === $"b.id", "left_anti) This PR provides serves as the basis for implementing `NOT EXISTS` and `NOT IN (...)` correlated sub-queries. It would also serve as good basis for implementing an more efficient `EXCEPT` operator. The PR has been (losely) based on PR's by both davies (https://github.com/apache/spark/pull/10706) and chenghao-intel (https://github.com/apache/spark/pull/10563); credit should be given where credit is due. This PR adds supports for `LEFT ANTI JOIN` to `BroadcastHashJoin` (including codegeneration), `ShuffledHashJoin` and `BroadcastNestedLoopJoin`. ### How was this patch tested? Added tests to `JoinSuite` and ported `ExistenceJoinSuite` from https://github.com/apache/spark/pull/10563. cc davies chenghao-intel rxin Author: Herman van Hovell <hvanhovell@questtec.nl> Closes #12214 from hvanhovell/SPARK-12610.
Diffstat (limited to 'graphx')
0 files changed, 0 insertions, 0 deletions