aboutsummaryrefslogtreecommitdiff
path: root/core
diff options
context:
space:
mode:
authorihainan <ihainan72@gmail.com>2015-08-30 08:26:14 +0100
committerSean Owen <sowen@cloudera.com>2015-08-30 08:26:14 +0100
commit1bfd9347822df65e76201c4c471a26488d722319 (patch)
tree44cbabe615ef4505f475c95b2908bdc91b248057 /core
parentca69fc8efda8a3e5442ffa16692a2b1eb86b7673 (diff)
downloadspark-1bfd9347822df65e76201c4c471a26488d722319.tar.gz
spark-1bfd9347822df65e76201c4c471a26488d722319.tar.bz2
spark-1bfd9347822df65e76201c4c471a26488d722319.zip
[SPARK-10184] [CORE] Optimization for bounds determination in RangePartitioner
JIRA Issue: https://issues.apache.org/jira/browse/SPARK-10184 Change `cumWeight > target` to `cumWeight >= target` in `RangePartitioner.determineBounds` method to make the output partitions more balanced. Author: ihainan <ihainan72@gmail.com> Closes #8397 from ihainan/opt_for_rangepartitioner.
Diffstat (limited to 'core')
-rw-r--r--core/src/main/scala/org/apache/spark/Partitioner.scala2
1 files changed, 1 insertions, 1 deletions
diff --git a/core/src/main/scala/org/apache/spark/Partitioner.scala b/core/src/main/scala/org/apache/spark/Partitioner.scala
index 4b9d59975b..29e581bb57 100644
--- a/core/src/main/scala/org/apache/spark/Partitioner.scala
+++ b/core/src/main/scala/org/apache/spark/Partitioner.scala
@@ -291,7 +291,7 @@ private[spark] object RangePartitioner {
while ((i < numCandidates) && (j < partitions - 1)) {
val (key, weight) = ordered(i)
cumWeight += weight
- if (cumWeight > target) {
+ if (cumWeight >= target) {
// Skip duplicate values.
if (previousBound.isEmpty || ordering.gt(key, previousBound.get)) {
bounds += key