From 9dbd4b864efacd09a8353d00c998be87f9eeacb2 Mon Sep 17 00:00:00 2001 From: David Navas Date: Sat, 17 Sep 2016 16:22:23 +0100 Subject: [SPARK-17529][CORE] Implement BitSet.clearUntil and use it during merge joins ## What changes were proposed in this pull request? Add a clearUntil() method on BitSet (adapted from the pre-existing setUntil() method). Use this method to clear the subset of the BitSet which needs to be used during merge joins. ## How was this patch tested? dev/run-tests, as well as performance tests on skewed data as described in jira. I expect there to be a small local performance hit using BitSet.clearUntil rather than BitSet.clear for normally shaped (unskewed) joins (additional read on the last long). This is expected to be de-minimis and was not specifically tested. Author: David Navas Closes #15084 from davidnavas/bitSet. --- .../org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'sql/core') diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala index b46af2a99a..81b3e1d224 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala @@ -954,12 +954,12 @@ private class SortMergeFullOuterJoinScanner( } if (leftMatches.size <= leftMatched.capacity) { - leftMatched.clear() + leftMatched.clearUntil(leftMatches.size) } else { leftMatched = new BitSet(leftMatches.size) } if (rightMatches.size <= rightMatched.capacity) { - rightMatched.clear() + rightMatched.clearUntil(rightMatches.size) } else { rightMatched = new BitSet(rightMatches.size) } -- cgit v1.2.3