diff options
author | Nathan Howell <nhowell@godaddy.com> | 2016-12-01 21:40:49 -0800 |
---|---|---|
committer | Reynold Xin <rxin@databricks.com> | 2016-12-01 21:40:49 -0800 |
commit | c82f16c15e0d4bfc54fb890a667d9164a088b5c6 (patch) | |
tree | 64d4c58b9c780e27135e33bfc9667abbc5d85ea0 /common/unsafe/src/main/java | |
parent | d3c90b74edecc527ee468bead41d1cca0b667668 (diff) | |
download | spark-c82f16c15e0d4bfc54fb890a667d9164a088b5c6.tar.gz spark-c82f16c15e0d4bfc54fb890a667d9164a088b5c6.tar.bz2 spark-c82f16c15e0d4bfc54fb890a667d9164a088b5c6.zip |
[SPARK-18658][SQL] Write text records directly to a FileOutputStream
## What changes were proposed in this pull request?
This replaces uses of `TextOutputFormat` with an `OutputStream`, which will either write directly to the filesystem or indirectly via a compressor (if so configured). This avoids intermediate buffering.
The inverse of this (reading directly from a stream) is necessary for streaming large JSON records (when `wholeFile` is enabled) so I wanted to keep the read and write paths symmetric.
## How was this patch tested?
Existing unit tests.
Author: Nathan Howell <nhowell@godaddy.com>
Closes #16089 from NathanHowell/SPARK-18658.
Diffstat (limited to 'common/unsafe/src/main/java')
-rw-r--r-- | common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java | 19 |
1 files changed, 19 insertions, 0 deletions
diff --git a/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java b/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java index e09a6b7d93..0255f53113 100644 --- a/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java +++ b/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java @@ -147,6 +147,25 @@ public final class UTF8String implements Comparable<UTF8String>, Externalizable, buffer.position(pos + numBytes); } + public void writeTo(OutputStream out) throws IOException { + if (base instanceof byte[] && offset >= BYTE_ARRAY_OFFSET) { + final byte[] bytes = (byte[]) base; + + // the offset includes an object header... this is only needed for unsafe copies + final long arrayOffset = offset - BYTE_ARRAY_OFFSET; + + // verify that the offset and length points somewhere inside the byte array + // and that the offset can safely be truncated to a 32-bit integer + if ((long) bytes.length < arrayOffset + numBytes) { + throw new ArrayIndexOutOfBoundsException(); + } + + out.write(bytes, (int) arrayOffset, numBytes); + } else { + out.write(getBytes()); + } + } + /** * Returns the number of bytes for a code point with the first byte as `b` * @param b The first byte of a code point |