diff options
author | Davies Liu <davies@databricks.com> | 2015-12-07 10:34:18 -0800 |
---|---|---|
committer | Davies Liu <davies.liu@gmail.com> | 2015-12-07 10:34:18 -0800 |
commit | 9cde7d5fa87e7ddfff0b9c1212920a1d9000539b (patch) | |
tree | b60c9e34374eb5ef3d03f9ede34ffd95f9fc7c39 /python/pyspark/context.py | |
parent | 6fd9e70e3ed43836a0685507fff9949f921234f4 (diff) | |
download | spark-9cde7d5fa87e7ddfff0b9c1212920a1d9000539b.tar.gz spark-9cde7d5fa87e7ddfff0b9c1212920a1d9000539b.tar.bz2 spark-9cde7d5fa87e7ddfff0b9c1212920a1d9000539b.zip |
[SPARK-12032] [SQL] Re-order inner joins to do join with conditions first
Currently, the order of joins is exactly the same as SQL query, some conditions may not pushed down to the correct join, then those join will become cross product and is extremely slow.
This patch try to re-order the inner joins (which are common in SQL query), pick the joins that have self-contain conditions first, delay those that does not have conditions.
After this patch, the TPCDS query Q64/65 can run hundreds times faster.
cc marmbrus nongli
Author: Davies Liu <davies@databricks.com>
Closes #10073 from davies/reorder_joins.
Diffstat (limited to 'python/pyspark/context.py')
0 files changed, 0 insertions, 0 deletions