diff options
author | Zongheng Yang <zongheng.y@gmail.com> | 2014-07-31 19:32:16 -0700 |
---|---|---|
committer | Michael Armbrust <michael@databricks.com> | 2014-07-31 19:32:16 -0700 |
commit | 8f51491ea78d8e88fc664c2eac3b4ac14226d98f (patch) | |
tree | 280853242a7533e518e462806dfd83a3e653370e /python/pyspark/mllib/random.py | |
parent | ef4ff00f87a4e8d38866f163f01741c2673e41da (diff) | |
download | spark-8f51491ea78d8e88fc664c2eac3b4ac14226d98f.tar.gz spark-8f51491ea78d8e88fc664c2eac3b4ac14226d98f.tar.bz2 spark-8f51491ea78d8e88fc664c2eac3b4ac14226d98f.zip |
[SPARK-2531 & SPARK-2436] [SQL] Optimize the BuildSide when planning BroadcastNestedLoopJoin.
This PR resolves the following two tickets:
- [SPARK-2531](https://issues.apache.org/jira/browse/SPARK-2531): BNLJ currently assumes the build side is the right relation. This patch refactors some of its logic to take into account a BuildSide properly.
- [SPARK-2436](https://issues.apache.org/jira/browse/SPARK-2436): building on top of the above, we simply use the physical size statistics (if available) of both relations, and make the smaller relation the build side in the planner.
Author: Zongheng Yang <zongheng.y@gmail.com>
Closes #1448 from concretevitamin/bnlj-buildSide and squashes the following commits:
1780351 [Zongheng Yang] Use size estimation to decide optimal build side of BNLJ.
68e6c5b [Zongheng Yang] Consolidate two adjacent pattern matchings.
96d312a [Zongheng Yang] Use a while loop instead of collection methods chaining.
4bc525e [Zongheng Yang] Make BroadcastNestedLoopJoin take a BuildSide.
Diffstat (limited to 'python/pyspark/mllib/random.py')
0 files changed, 0 insertions, 0 deletions