diff options
author | Ron Hu <ron.hu@huawei.com> | 2017-04-03 17:27:12 -0700 |
---|---|---|
committer | Xiao Li <gatorsmile@gmail.com> | 2017-04-03 17:27:12 -0700 |
commit | e7877fd4728ed41e440d7c4d8b6b02bd0d9e873e (patch) | |
tree | 9d9619970eb8f9392edfe2c4d94f1d5234e8093d /sql/core/src/main | |
parent | 58c9e6e77ae26345291dd9fce2c57aadcc36f66c (diff) | |
download | spark-e7877fd4728ed41e440d7c4d8b6b02bd0d9e873e.tar.gz spark-e7877fd4728ed41e440d7c4d8b6b02bd0d9e873e.tar.bz2 spark-e7877fd4728ed41e440d7c4d8b6b02bd0d9e873e.zip |
[SPARK-19408][SQL] filter estimation on two columns of same table
## What changes were proposed in this pull request?
In SQL queries, we also see predicate expressions involving two columns such as "column-1 (op) column-2" where column-1 and column-2 belong to same table. Note that, if column-1 and column-2 belong to different tables, then it is a join operator's work, NOT a filter operator's work.
This PR estimates filter selectivity on two columns of same table. For example, multiple tpc-h queries have this predicate "WHERE l_commitdate < l_receiptdate"
## How was this patch tested?
We added 6 new test cases to test various logical predicates involving two columns of same table.
Please review http://spark.apache.org/contributing.html before opening a pull request.
Author: Ron Hu <ron.hu@huawei.com>
Author: U-CHINA\r00754707 <r00754707@R00754707-SC04.china.huawei.com>
Closes #17415 from ron8hu/filterTwoColumns.
Diffstat (limited to 'sql/core/src/main')
0 files changed, 0 insertions, 0 deletions