diff options
author | Josh Rosen <joshrosen@databricks.com> | 2015-05-08 22:09:55 -0400 |
---|---|---|
committer | Yin Huai <yhuai@databricks.com> | 2015-05-08 22:09:55 -0400 |
commit | cde5483884068b0ae1470b9b9b3ee54ab944ab12 (patch) | |
tree | 28d7d3f6cd5da3ae79e3fd1f0a53824775c03f51 /data/mllib/ridge-data | |
parent | 0a901dd3a1eb3fd459d45b771ce4ad2cfef2a944 (diff) | |
download | spark-cde5483884068b0ae1470b9b9b3ee54ab944ab12.tar.gz spark-cde5483884068b0ae1470b9b9b3ee54ab944ab12.tar.bz2 spark-cde5483884068b0ae1470b9b9b3ee54ab944ab12.zip |
[SPARK-7375] [SQL] Avoid row copying in exchange when sort.serializeMapOutputs takes effect
This patch refactors the SQL `Exchange` operator's logic for determining whether map outputs need to be copied before being shuffled. As part of this change, we'll now avoid unnecessary copies in cases where sort-based shuffle operates on serialized map outputs (as in #4450 /
SPARK-4550).
This patch also includes a change to copy the input to RangePartitioner partition bounds calculation, which is necessary because this calculation buffers mutable Java objects.
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/5948)
<!-- Reviewable:end -->
Author: Josh Rosen <joshrosen@databricks.com>
Closes #5948 from JoshRosen/SPARK-7375 and squashes the following commits:
f305ff3 [Josh Rosen] Reduce scope of some variables in Exchange
899e1d7 [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-7375
6a6bfce [Josh Rosen] Fix issue related to RangePartitioning:
ad006a4 [Josh Rosen] [SPARK-7375] Avoid defensive copying in exchange operator when sort.serializeMapOutputs takes effect.
Diffstat (limited to 'data/mllib/ridge-data')
0 files changed, 0 insertions, 0 deletions