[SPARK-9517][SQL] BytesToBytesMap should encode data the same way as UnsafeExternalSorter - spark

diff options

author	Reynold Xin <rxin@databricks.com>	2015-07-31 23:55:16 -0700
committer	Reynold Xin <rxin@databricks.com>	2015-07-31 23:55:16 -0700
commit	d90f2cf7a2a1d1e69f9ab385f35f62d4091b5302 (patch)
tree	94dff8456047924b32f7295dca1e7f47702d5e16 /mllib
parent	67ad4e21fc68336b0ad6f9a363fb5ebb51f592bf (diff)
download	spark-d90f2cf7a2a1d1e69f9ab385f35f62d4091b5302.tar.gz spark-d90f2cf7a2a1d1e69f9ab385f35f62d4091b5302.tar.bz2 spark-d90f2cf7a2a1d1e69f9ab385f35f62d4091b5302.zip

[SPARK-9517][SQL] BytesToBytesMap should encode data the same way as UnsafeExternalSorter

BytesToBytesMap current encodes key/value data in the following format: ``` 8B key length, key data, 8B value length, value data ``` UnsafeExternalSorter, on the other hand, encodes data this way: ``` 4B record length, data ``` As a result, we cannot pass records encoded by BytesToBytesMap directly into UnsafeExternalSorter for sorting. However, if we rearrange data slightly, we can then pass the key/value records directly into UnsafeExternalSorter: ``` 4B key+value length, 4B key length, key data, value data ``` Author: Reynold Xin <rxin@databricks.com> Closes #7845 from rxin/kvsort-rebase and squashes the following commits: 5716b59 [Reynold Xin] Fixed test. 2e62ccb [Reynold Xin] Updated BytesToBytesMap's data encoding to put the key first. a51b641 [Reynold Xin] Added a KV sorter interface.

Diffstat (limited to 'mllib')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: