diff options
author | Zhenhua Wang <wzh_zju@163.com> | 2017-02-15 08:21:51 -0800 |
---|---|---|
committer | Wenchen Fan <wenchen@databricks.com> | 2017-02-15 08:21:51 -0800 |
commit | 601b9c3e6821b533a76d538f7f26bb622fd026fc (patch) | |
tree | a055009c0bbacb66a140d9d3c8761a51cbb35b00 /python/pyspark/ml | |
parent | 8b75f8c1c9acae9c5c0dee92ad4f50195bf185d4 (diff) | |
download | spark-601b9c3e6821b533a76d538f7f26bb622fd026fc.tar.gz spark-601b9c3e6821b533a76d538f7f26bb622fd026fc.tar.bz2 spark-601b9c3e6821b533a76d538f7f26bb622fd026fc.zip |
[SPARK-17076][SQL] Cardinality estimation for join based on basic column statistics
## What changes were proposed in this pull request?
Support cardinality estimation and stats propagation for all join types.
Limitations:
- For inner/outer joins without any equal condition, we estimate it like cartesian product.
- For left semi/anti joins, since we can't apply the heuristics for inner join to it, for now we just propagate the statistics from left side. We should support them when other advanced stats (e.g. histograms) are available in spark.
## How was this patch tested?
Add a new test suite.
Author: Zhenhua Wang <wzh_zju@163.com>
Author: wangzhenhua <wangzhenhua@huawei.com>
Closes #16228 from wzhfy/joinEstimate.
Diffstat (limited to 'python/pyspark/ml')
0 files changed, 0 insertions, 0 deletions