aboutsummaryrefslogtreecommitdiff
path: root/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala
diff options
context:
space:
mode:
Diffstat (limited to 'core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala')
-rw-r--r--core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala30
1 files changed, 15 insertions, 15 deletions
diff --git a/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala b/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala
index 2440139ac9..44b1d90667 100644
--- a/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala
+++ b/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala
@@ -67,24 +67,24 @@ import org.apache.spark.storage.{BlockId, DiskBlockObjectWriter}
*
* At a high level, this class works internally as follows:
*
- * - We repeatedly fill up buffers of in-memory data, using either a PartitionedAppendOnlyMap if
- * we want to combine by key, or a PartitionedPairBuffer if we don't.
- * Inside these buffers, we sort elements by partition ID and then possibly also by key.
- * To avoid calling the partitioner multiple times with each key, we store the partition ID
- * alongside each record.
+ * - We repeatedly fill up buffers of in-memory data, using either a PartitionedAppendOnlyMap if
+ * we want to combine by key, or a PartitionedPairBuffer if we don't.
+ * Inside these buffers, we sort elements by partition ID and then possibly also by key.
+ * To avoid calling the partitioner multiple times with each key, we store the partition ID
+ * alongside each record.
*
- * - When each buffer reaches our memory limit, we spill it to a file. This file is sorted first
- * by partition ID and possibly second by key or by hash code of the key, if we want to do
- * aggregation. For each file, we track how many objects were in each partition in memory, so we
- * don't have to write out the partition ID for every element.
+ * - When each buffer reaches our memory limit, we spill it to a file. This file is sorted first
+ * by partition ID and possibly second by key or by hash code of the key, if we want to do
+ * aggregation. For each file, we track how many objects were in each partition in memory, so we
+ * don't have to write out the partition ID for every element.
*
- * - When the user requests an iterator or file output, the spilled files are merged, along with
- * any remaining in-memory data, using the same sort order defined above (unless both sorting
- * and aggregation are disabled). If we need to aggregate by key, we either use a total ordering
- * from the ordering parameter, or read the keys with the same hash code and compare them with
- * each other for equality to merge values.
+ * - When the user requests an iterator or file output, the spilled files are merged, along with
+ * any remaining in-memory data, using the same sort order defined above (unless both sorting
+ * and aggregation are disabled). If we need to aggregate by key, we either use a total ordering
+ * from the ordering parameter, or read the keys with the same hash code and compare them with
+ * each other for equality to merge values.
*
- * - Users are expected to call stop() at the end to delete all the intermediate files.
+ * - Users are expected to call stop() at the end to delete all the intermediate files.
*/
private[spark] class ExternalSorter[K, V, C](
context: TaskContext,