diff options
author | Wenchen Fan <cloud0fan@outlook.com> | 2015-08-02 23:41:16 -0700 |
---|---|---|
committer | Reynold Xin <rxin@databricks.com> | 2015-08-02 23:41:16 -0700 |
commit | 608353c8e8e50461fafff91a2c885dca8af3aaa8 (patch) | |
tree | 2d33812459a03879c775fe4d5ecc1a34b50c5ac1 /unsafe | |
parent | 687c8c37150f4c93f8e57d86bb56321a4891286b (diff) | |
download | spark-608353c8e8e50461fafff91a2c885dca8af3aaa8.tar.gz spark-608353c8e8e50461fafff91a2c885dca8af3aaa8.tar.bz2 spark-608353c8e8e50461fafff91a2c885dca8af3aaa8.zip |
[SPARK-9404][SPARK-9542][SQL] unsafe array data and map data
This PR adds a UnsafeArrayData, current we encode it in this way:
first 4 bytes is the # elements
then each 4 byte is the start offset of the element, unless it is negative, in which case the element is null.
followed by the elements themselves
an example: [10, 11, 12, 13, null, 14] will be encoded as:
5, 28, 32, 36, 40, -44, 44, 10, 11, 12, 13, 14
Note that, when we read a UnsafeArrayData from bytes, we can read the first 4 bytes as numElements and take the rest(first 4 bytes skipped) as value region.
unsafe map data just use 2 unsafe array data, first 4 bytes is # of elements, second 4 bytes is numBytes of key array, the follows key array data and value array data.
Author: Wenchen Fan <cloud0fan@outlook.com>
Closes #7752 from cloud-fan/unsafe-array and squashes the following commits:
3269bd7 [Wenchen Fan] fix a bug
6445289 [Wenchen Fan] add unit tests
49adf26 [Wenchen Fan] add unsafe map
20d1039 [Wenchen Fan] add comments and unsafe converter
821b8db [Wenchen Fan] add unsafe array
Diffstat (limited to 'unsafe')
-rw-r--r-- | unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java | 3 |
1 files changed, 3 insertions, 0 deletions
diff --git a/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java b/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java index 916825d007..f6c9b87778 100644 --- a/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java +++ b/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java @@ -43,6 +43,9 @@ public final class UTF8String implements Comparable<UTF8String>, Serializable { private final long offset; private final int numBytes; + public Object getBaseObject() { return base; } + public long getBaseOffset() { return offset; } + private static int[] bytesOfCodePointInUTF8 = {2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, |