diff options
author | Tejas Patil <tejasp@fb.com> | 2016-08-23 18:48:08 -0700 |
---|---|---|
committer | Reynold Xin <rxin@databricks.com> | 2016-08-23 18:48:08 -0700 |
commit | c1937dd19a23bd096a4707656c7ba19fb5c16966 (patch) | |
tree | 3deadf3f23ea3f9b93efe0a3d358cd8a742cbc31 /sql/hive | |
parent | bf8ff833e30b39e5e5e35ba8dcac31b79323838c (diff) | |
download | spark-c1937dd19a23bd096a4707656c7ba19fb5c16966.tar.gz spark-c1937dd19a23bd096a4707656c7ba19fb5c16966.tar.bz2 spark-c1937dd19a23bd096a4707656c7ba19fb5c16966.zip |
[SPARK-16862] Configurable buffer size in `UnsafeSorterSpillReader`
## What changes were proposed in this pull request?
Jira: https://issues.apache.org/jira/browse/SPARK-16862
`BufferedInputStream` used in `UnsafeSorterSpillReader` uses the default 8k buffer to read data off disk. This PR makes it configurable to improve on disk reads. I have made the default value to be 1 MB as with that value I observed improved performance.
## How was this patch tested?
I am relying on the existing unit tests.
## Performance
After deploying this change to prod and setting the config to 1 mb, there was a 12% reduction in the CPU time and 19.5% reduction in CPU reservation time.
Author: Tejas Patil <tejasp@fb.com>
Closes #14726 from tejasapatil/spill_buffer_2.
Diffstat (limited to 'sql/hive')
0 files changed, 0 insertions, 0 deletions