aboutsummaryrefslogtreecommitdiff
path: root/sql/hive-thriftserver/src/main/scala/org/apache/hive/service/server/HiveServerServerOptionsProcessor.scala
diff options
context:
space:
mode:
authorgatorsmile <gatorsmile@gmail.com>2016-04-29 15:30:36 +0800
committerWenchen Fan <wenchen@databricks.com>2016-04-29 15:30:36 +0800
commit222dcf79377df33007d7a9780dafa2c740dbe6a3 (patch)
treee251b64b68f42d99d2de4ed96b95ca0b0ff1419c /sql/hive-thriftserver/src/main/scala/org/apache/hive/service/server/HiveServerServerOptionsProcessor.scala
parente249e6f8b551614c82cd62e827c3647166e918e3 (diff)
downloadspark-222dcf79377df33007d7a9780dafa2c740dbe6a3.tar.gz
spark-222dcf79377df33007d7a9780dafa2c740dbe6a3.tar.bz2
spark-222dcf79377df33007d7a9780dafa2c740dbe6a3.zip
[SPARK-12660][SPARK-14967][SQL] Implement Except Distinct by Left Anti Join
#### What changes were proposed in this pull request? Replaces a logical `Except` operator with a `Left-anti Join` operator. This way, we can take advantage of all the benefits of join implementations (e.g. managed memory, code generation, broadcast joins). ```SQL SELECT a1, a2 FROM Tab1 EXCEPT SELECT b1, b2 FROM Tab2 ==> SELECT DISTINCT a1, a2 FROM Tab1 LEFT ANTI JOIN Tab2 ON a1<=>b1 AND a2<=>b2 ``` Note: 1. This rule is only applicable to EXCEPT DISTINCT. Do not use it for EXCEPT ALL. 2. This rule has to be done after de-duplicating the attributes; otherwise, the enerated join conditions will be incorrect. This PR also corrects the existing behavior in Spark. Before this PR, the behavior is like ```SQL test("except") { val df_left = Seq(1, 2, 2, 3, 3, 4).toDF("id") val df_right = Seq(1, 3).toDF("id") checkAnswer( df_left.except(df_right), Row(2) :: Row(2) :: Row(4) :: Nil ) } ``` After this PR, the result is corrected. We strictly follow the SQL compliance of `Except Distinct`. #### How was this patch tested? Modified and added a few test cases to verify the optimization rule and the results of operators. Author: gatorsmile <gatorsmile@gmail.com> Closes #12736 from gatorsmile/exceptByAntiJoin.
Diffstat (limited to 'sql/hive-thriftserver/src/main/scala/org/apache/hive/service/server/HiveServerServerOptionsProcessor.scala')
0 files changed, 0 insertions, 0 deletions