aboutsummaryrefslogtreecommitdiff
path: root/python/pyspark/mllib/random.py
diff options
context:
space:
mode:
authorZongheng Yang <zongheng.y@gmail.com>2014-07-31 19:32:16 -0700
committerMichael Armbrust <michael@databricks.com>2014-07-31 19:32:16 -0700
commit8f51491ea78d8e88fc664c2eac3b4ac14226d98f (patch)
tree280853242a7533e518e462806dfd83a3e653370e /python/pyspark/mllib/random.py
parentef4ff00f87a4e8d38866f163f01741c2673e41da (diff)
downloadspark-8f51491ea78d8e88fc664c2eac3b4ac14226d98f.tar.gz
spark-8f51491ea78d8e88fc664c2eac3b4ac14226d98f.tar.bz2
spark-8f51491ea78d8e88fc664c2eac3b4ac14226d98f.zip
[SPARK-2531 & SPARK-2436] [SQL] Optimize the BuildSide when planning BroadcastNestedLoopJoin.
This PR resolves the following two tickets: - [SPARK-2531](https://issues.apache.org/jira/browse/SPARK-2531): BNLJ currently assumes the build side is the right relation. This patch refactors some of its logic to take into account a BuildSide properly. - [SPARK-2436](https://issues.apache.org/jira/browse/SPARK-2436): building on top of the above, we simply use the physical size statistics (if available) of both relations, and make the smaller relation the build side in the planner. Author: Zongheng Yang <zongheng.y@gmail.com> Closes #1448 from concretevitamin/bnlj-buildSide and squashes the following commits: 1780351 [Zongheng Yang] Use size estimation to decide optimal build side of BNLJ. 68e6c5b [Zongheng Yang] Consolidate two adjacent pattern matchings. 96d312a [Zongheng Yang] Use a while loop instead of collection methods chaining. 4bc525e [Zongheng Yang] Make BroadcastNestedLoopJoin take a BuildSide.
Diffstat (limited to 'python/pyspark/mllib/random.py')
0 files changed, 0 insertions, 0 deletions