[SPARK-9517][SQL] BytesToBytesMap should encode data the same way as UnsafeExternalSorter

BytesToBytesMap current encodes key/value data in the following format: ``` 8B key length, key data, 8B value length, value data ``` UnsafeExternalSorter, on the other hand, encodes data this way: ``` 4B record length, data ``` As a result, we cannot pass records encoded by BytesToBytesMap directly into UnsafeExternalSorter for sorting. However, if we rearrange data slightly, we can then pass the key/value records directly into UnsafeExternalSorter: ``` 4B key+value length, 4B key length, key data, value data ``` Author: Reynold Xin <rxin@databricks.com> Closes #7845 from rxin/kvsort-rebase and squashes the following commits: 5716b59 [Reynold Xin] Fixed test. 2e62ccb [Reynold Xin] Updated BytesToBytesMap's data encoding to put the key first. a51b641 [Reynold Xin] Added a KV sorter interface.
author: Reynold Xin <rxin@databricks.com> 2015-07-31 23:55:16 -0700
committer: Reynold Xin <rxin@databricks.com> 2015-07-31 23:55:16 -0700
commit: d90f2cf7a2a1d1e69f9ab385f35f62d4091b5302 (patch)
tree: 94dff8456047924b32f7295dca1e7f47702d5e16 /core/src/test/java
parent: 67ad4e21fc68336b0ad6f9a363fb5ebb51f592bf (diff)
download: spark-d90f2cf7a2a1d1e69f9ab385f35f62d4091b5302.tar.gz
spark-d90f2cf7a2a1d1e69f9ab385f35f62d4091b5302.tar.bz2
spark-d90f2cf7a2a1d1e69f9ab385f35f62d4091b5302.zip
1 files changed, 3 insertions, 3 deletions
diff --git a/core/src/test/java/org/apache/spark/unsafe/map/AbstractBytesToBytesMapSuite.java b/core/src/test/java/org/apache/spark/unsafe/map/AbstractBytesToBytesMapSuite.java
index 60f483acbc..70f8ca4d21 100644
--- a/core/src/test/java/org/apache/spark/unsafe/map/AbstractBytesToBytesMapSuite.java
+++ b/core/src/test/java/org/apache/spark/unsafe/map/AbstractBytesToBytesMapSuite.java
@@ -243,17 +243,17 @@ public abstract class AbstractBytesToBytesMapSuite {
   @Test
   public void iteratingOverDataPagesWithWastedSpace() throws Exception {
     final int NUM_ENTRIES = 1000 * 1000;
-    final int KEY_LENGTH = 16;
+    final int KEY_LENGTH = 24;
     final int VALUE_LENGTH = 40;
     final BytesToBytesMap map = new BytesToBytesMap(
       taskMemoryManager, shuffleMemoryManager, NUM_ENTRIES, PAGE_SIZE_BYTES);
-    // Each record will take 8 + 8 + 16 + 40 = 72 bytes of space in the data page. Our 64-megabyte
+    // Each record will take 8 + 24 + 40 = 72 bytes of space in the data page. Our 64-megabyte
     // pages won't be evenly-divisible by records of this size, which will cause us to waste some
     // space at the end of the page. This is necessary in order for us to take the end-of-record
     // handling branch in iterator().
     try {
       for (int i = 0; i < NUM_ENTRIES; i++) {
-        final long[] key = new long[] { i, i };  // 2 * 8 = 16 bytes
+        final long[] key = new long[] { i, i, i };  // 3 * 8 = 24 bytes
         final long[] value = new long[] { i, i, i, i, i }; // 5 * 8 = 40 bytes
         final BytesToBytesMap.Location loc = map.lookup(
           key,
author	Reynold Xin <rxin@databricks.com>	2015-07-31 23:55:16 -0700
committer	Reynold Xin <rxin@databricks.com>	2015-07-31 23:55:16 -0700
commit	d90f2cf7a2a1d1e69f9ab385f35f62d4091b5302 (patch)
tree	94dff8456047924b32f7295dca1e7f47702d5e16 /core/src/test/java
parent	67ad4e21fc68336b0ad6f9a363fb5ebb51f592bf (diff)
download	spark-d90f2cf7a2a1d1e69f9ab385f35f62d4091b5302.tar.gz spark-d90f2cf7a2a1d1e69f9ab385f35f62d4091b5302.tar.bz2 spark-d90f2cf7a2a1d1e69f9ab385f35f62d4091b5302.zip