[SPARK-19408][SQL] filter estimation on two columns of same table - spark

diff options

author	Ron Hu <ron.hu@huawei.com>	2017-04-03 17:27:12 -0700
committer	Xiao Li <gatorsmile@gmail.com>	2017-04-03 17:27:12 -0700
commit	e7877fd4728ed41e440d7c4d8b6b02bd0d9e873e (patch)
tree	9d9619970eb8f9392edfe2c4d94f1d5234e8093d /sql/core/src/main
parent	58c9e6e77ae26345291dd9fce2c57aadcc36f66c (diff)
download	spark-e7877fd4728ed41e440d7c4d8b6b02bd0d9e873e.tar.gz spark-e7877fd4728ed41e440d7c4d8b6b02bd0d9e873e.tar.bz2 spark-e7877fd4728ed41e440d7c4d8b6b02bd0d9e873e.zip

[SPARK-19408][SQL] filter estimation on two columns of same table

## What changes were proposed in this pull request? In SQL queries, we also see predicate expressions involving two columns such as "column-1 (op) column-2" where column-1 and column-2 belong to same table. Note that, if column-1 and column-2 belong to different tables, then it is a join operator's work, NOT a filter operator's work. This PR estimates filter selectivity on two columns of same table. For example, multiple tpc-h queries have this predicate "WHERE l_commitdate < l_receiptdate" ## How was this patch tested? We added 6 new test cases to test various logical predicates involving two columns of same table. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Ron Hu <ron.hu@huawei.com> Author: U-CHINA\r00754707 <r00754707@R00754707-SC04.china.huawei.com> Closes #17415 from ron8hu/filterTwoColumns.

Diffstat (limited to 'sql/core/src/main')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: