diff options
author | Reynold Xin <rxin@databricks.com> | 2015-07-31 23:55:16 -0700 |
---|---|---|
committer | Reynold Xin <rxin@databricks.com> | 2015-07-31 23:55:16 -0700 |
commit | d90f2cf7a2a1d1e69f9ab385f35f62d4091b5302 (patch) | |
tree | 94dff8456047924b32f7295dca1e7f47702d5e16 /sql/catalyst | |
parent | 67ad4e21fc68336b0ad6f9a363fb5ebb51f592bf (diff) | |
download | spark-d90f2cf7a2a1d1e69f9ab385f35f62d4091b5302.tar.gz spark-d90f2cf7a2a1d1e69f9ab385f35f62d4091b5302.tar.bz2 spark-d90f2cf7a2a1d1e69f9ab385f35f62d4091b5302.zip |
[SPARK-9517][SQL] BytesToBytesMap should encode data the same way as UnsafeExternalSorter
BytesToBytesMap current encodes key/value data in the following format:
```
8B key length, key data, 8B value length, value data
```
UnsafeExternalSorter, on the other hand, encodes data this way:
```
4B record length, data
```
As a result, we cannot pass records encoded by BytesToBytesMap directly into UnsafeExternalSorter for sorting. However, if we rearrange data slightly, we can then pass the key/value records directly into UnsafeExternalSorter:
```
4B key+value length, 4B key length, key data, value data
```
Author: Reynold Xin <rxin@databricks.com>
Closes #7845 from rxin/kvsort-rebase and squashes the following commits:
5716b59 [Reynold Xin] Fixed test.
2e62ccb [Reynold Xin] Updated BytesToBytesMap's data encoding to put the key first.
a51b641 [Reynold Xin] Added a KV sorter interface.
Diffstat (limited to 'sql/catalyst')
-rw-r--r-- | sql/catalyst/src/main/java/org/apache/spark/sql/execution/UnsafeKeyValueSorter.java | 30 |
1 files changed, 30 insertions, 0 deletions
diff --git a/sql/catalyst/src/main/java/org/apache/spark/sql/execution/UnsafeKeyValueSorter.java b/sql/catalyst/src/main/java/org/apache/spark/sql/execution/UnsafeKeyValueSorter.java new file mode 100644 index 0000000000..59c774da74 --- /dev/null +++ b/sql/catalyst/src/main/java/org/apache/spark/sql/execution/UnsafeKeyValueSorter.java @@ -0,0 +1,30 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution; + +import java.io.IOException; + +import org.apache.spark.sql.catalyst.expressions.UnsafeRow; +import org.apache.spark.unsafe.KVIterator; + +public abstract class UnsafeKeyValueSorter { + + public abstract void insert(UnsafeRow key, UnsafeRow value); + + public abstract KVIterator<UnsafeRow, UnsafeRow> sort() throws IOException; +} |