[SPARK-17076][SQL] Cardinality estimation for join based on basic column statistics - spark

diff options

author	Zhenhua Wang <wzh_zju@163.com>	2017-02-15 08:21:51 -0800
committer	Wenchen Fan <wenchen@databricks.com>	2017-02-15 08:21:51 -0800
commit	601b9c3e6821b533a76d538f7f26bb622fd026fc (patch)
tree	a055009c0bbacb66a140d9d3c8761a51cbb35b00 /python/pyspark/ml
parent	8b75f8c1c9acae9c5c0dee92ad4f50195bf185d4 (diff)
download	spark-601b9c3e6821b533a76d538f7f26bb622fd026fc.tar.gz spark-601b9c3e6821b533a76d538f7f26bb622fd026fc.tar.bz2 spark-601b9c3e6821b533a76d538f7f26bb622fd026fc.zip

[SPARK-17076][SQL] Cardinality estimation for join based on basic column statistics

## What changes were proposed in this pull request? Support cardinality estimation and stats propagation for all join types. Limitations: - For inner/outer joins without any equal condition, we estimate it like cartesian product. - For left semi/anti joins, since we can't apply the heuristics for inner join to it, for now we just propagate the statistics from left side. We should support them when other advanced stats (e.g. histograms) are available in spark. ## How was this patch tested? Add a new test suite. Author: Zhenhua Wang <wzh_zju@163.com> Author: wangzhenhua <wangzhenhua@huawei.com> Closes #16228 from wzhfy/joinEstimate.

Diffstat (limited to 'python/pyspark/ml')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: