aboutsummaryrefslogtreecommitdiff
path: root/unsafe
diff options
context:
space:
mode:
authorWenchen Fan <cloud0fan@outlook.com>2015-08-02 23:41:16 -0700
committerReynold Xin <rxin@databricks.com>2015-08-02 23:41:16 -0700
commit608353c8e8e50461fafff91a2c885dca8af3aaa8 (patch)
tree2d33812459a03879c775fe4d5ecc1a34b50c5ac1 /unsafe
parent687c8c37150f4c93f8e57d86bb56321a4891286b (diff)
downloadspark-608353c8e8e50461fafff91a2c885dca8af3aaa8.tar.gz
spark-608353c8e8e50461fafff91a2c885dca8af3aaa8.tar.bz2
spark-608353c8e8e50461fafff91a2c885dca8af3aaa8.zip
[SPARK-9404][SPARK-9542][SQL] unsafe array data and map data
This PR adds a UnsafeArrayData, current we encode it in this way: first 4 bytes is the # elements then each 4 byte is the start offset of the element, unless it is negative, in which case the element is null. followed by the elements themselves an example: [10, 11, 12, 13, null, 14] will be encoded as: 5, 28, 32, 36, 40, -44, 44, 10, 11, 12, 13, 14 Note that, when we read a UnsafeArrayData from bytes, we can read the first 4 bytes as numElements and take the rest(first 4 bytes skipped) as value region. unsafe map data just use 2 unsafe array data, first 4 bytes is # of elements, second 4 bytes is numBytes of key array, the follows key array data and value array data. Author: Wenchen Fan <cloud0fan@outlook.com> Closes #7752 from cloud-fan/unsafe-array and squashes the following commits: 3269bd7 [Wenchen Fan] fix a bug 6445289 [Wenchen Fan] add unit tests 49adf26 [Wenchen Fan] add unsafe map 20d1039 [Wenchen Fan] add comments and unsafe converter 821b8db [Wenchen Fan] add unsafe array
Diffstat (limited to 'unsafe')
-rw-r--r--unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java3
1 files changed, 3 insertions, 0 deletions
diff --git a/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java b/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
index 916825d007..f6c9b87778 100644
--- a/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
+++ b/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
@@ -43,6 +43,9 @@ public final class UTF8String implements Comparable<UTF8String>, Serializable {
private final long offset;
private final int numBytes;
+ public Object getBaseObject() { return base; }
+ public long getBaseOffset() { return offset; }
+
private static int[] bytesOfCodePointInUTF8 = {2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,