aboutsummaryrefslogtreecommitdiff
path: root/project
diff options
context:
space:
mode:
authorNong Li <nong@databricks.com>2016-02-26 12:43:50 -0800
committerDavies Liu <davies.liu@gmail.com>2016-02-26 12:43:50 -0800
commit0598a2b81d1426dd2cf9e6fc32cef345364d18c6 (patch)
treea7341a42f902110e317a895968d7df7cd5e6ada4 /project
parent6df1e55a6594ae4bc7882f44af8d230aad9489b4 (diff)
downloadspark-0598a2b81d1426dd2cf9e6fc32cef345364d18c6.tar.gz
spark-0598a2b81d1426dd2cf9e6fc32cef345364d18c6.tar.bz2
spark-0598a2b81d1426dd2cf9e6fc32cef345364d18c6.zip
[SPARK-13499] [SQL] Performance improvements for parquet reader.
## What changes were proposed in this pull request? This patch includes these performance fixes: - Remove unnecessary setNotNull() calls. The NULL bits are cleared already. - Speed up RLE group decoding - Speed up dictionary decoding by decoding NULLs directly into the result. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) In addition to the updated benchmarks, on TPCDS, the result of these changes running Q55 (sf40) is: ``` TPCDS: Best/Avg Time(ms) Rate(M/s) Per Row(ns) --------------------------------------------------------------------------------- q55 (Before) 6398 / 6616 18.0 55.5 q55 (After) 4983 / 5189 23.1 43.3 ``` Author: Nong Li <nong@databricks.com> Closes #11375 from nongli/spark-13499.
Diffstat (limited to 'project')
0 files changed, 0 insertions, 0 deletions