aboutsummaryrefslogtreecommitdiff
path: root/python/pyspark/ml
diff options
context:
space:
mode:
authorZhenhua Wang <wzh_zju@163.com>2017-02-15 08:21:51 -0800
committerWenchen Fan <wenchen@databricks.com>2017-02-15 08:21:51 -0800
commit601b9c3e6821b533a76d538f7f26bb622fd026fc (patch)
treea055009c0bbacb66a140d9d3c8761a51cbb35b00 /python/pyspark/ml
parent8b75f8c1c9acae9c5c0dee92ad4f50195bf185d4 (diff)
downloadspark-601b9c3e6821b533a76d538f7f26bb622fd026fc.tar.gz
spark-601b9c3e6821b533a76d538f7f26bb622fd026fc.tar.bz2
spark-601b9c3e6821b533a76d538f7f26bb622fd026fc.zip
[SPARK-17076][SQL] Cardinality estimation for join based on basic column statistics
## What changes were proposed in this pull request? Support cardinality estimation and stats propagation for all join types. Limitations: - For inner/outer joins without any equal condition, we estimate it like cartesian product. - For left semi/anti joins, since we can't apply the heuristics for inner join to it, for now we just propagate the statistics from left side. We should support them when other advanced stats (e.g. histograms) are available in spark. ## How was this patch tested? Add a new test suite. Author: Zhenhua Wang <wzh_zju@163.com> Author: wangzhenhua <wangzhenhua@huawei.com> Closes #16228 from wzhfy/joinEstimate.
Diffstat (limited to 'python/pyspark/ml')
0 files changed, 0 insertions, 0 deletions