diff options
author | Nong Li <nong@databricks.com> | 2016-02-26 12:43:50 -0800 |
---|---|---|
committer | Davies Liu <davies.liu@gmail.com> | 2016-02-26 12:43:50 -0800 |
commit | 0598a2b81d1426dd2cf9e6fc32cef345364d18c6 (patch) | |
tree | a7341a42f902110e317a895968d7df7cd5e6ada4 /streaming/pom.xml | |
parent | 6df1e55a6594ae4bc7882f44af8d230aad9489b4 (diff) | |
download | spark-0598a2b81d1426dd2cf9e6fc32cef345364d18c6.tar.gz spark-0598a2b81d1426dd2cf9e6fc32cef345364d18c6.tar.bz2 spark-0598a2b81d1426dd2cf9e6fc32cef345364d18c6.zip |
[SPARK-13499] [SQL] Performance improvements for parquet reader.
## What changes were proposed in this pull request?
This patch includes these performance fixes:
- Remove unnecessary setNotNull() calls. The NULL bits are cleared already.
- Speed up RLE group decoding
- Speed up dictionary decoding by decoding NULLs directly into the result.
## How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
In addition to the updated benchmarks, on TPCDS, the result of these changes
running Q55 (sf40) is:
```
TPCDS: Best/Avg Time(ms) Rate(M/s) Per Row(ns)
---------------------------------------------------------------------------------
q55 (Before) 6398 / 6616 18.0 55.5
q55 (After) 4983 / 5189 23.1 43.3
```
Author: Nong Li <nong@databricks.com>
Closes #11375 from nongli/spark-13499.
Diffstat (limited to 'streaming/pom.xml')
0 files changed, 0 insertions, 0 deletions