[SPARK-13499] [SQL] Performance improvements for parquet reader. - spark

diff options

author	Nong Li <nong@databricks.com>	2016-02-26 12:43:50 -0800
committer	Davies Liu <davies.liu@gmail.com>	2016-02-26 12:43:50 -0800
commit	0598a2b81d1426dd2cf9e6fc32cef345364d18c6 (patch)
tree	a7341a42f902110e317a895968d7df7cd5e6ada4 /project
parent	6df1e55a6594ae4bc7882f44af8d230aad9489b4 (diff)
download	spark-0598a2b81d1426dd2cf9e6fc32cef345364d18c6.tar.gz spark-0598a2b81d1426dd2cf9e6fc32cef345364d18c6.tar.bz2 spark-0598a2b81d1426dd2cf9e6fc32cef345364d18c6.zip

[SPARK-13499] [SQL] Performance improvements for parquet reader.

## What changes were proposed in this pull request? This patch includes these performance fixes: - Remove unnecessary setNotNull() calls. The NULL bits are cleared already. - Speed up RLE group decoding - Speed up dictionary decoding by decoding NULLs directly into the result. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) In addition to the updated benchmarks, on TPCDS, the result of these changes running Q55 (sf40) is: ``` TPCDS: Best/Avg Time(ms) Rate(M/s) Per Row(ns) --------------------------------------------------------------------------------- q55 (Before) 6398 / 6616 18.0 55.5 q55 (After) 4983 / 5189 23.1 43.3 ``` Author: Nong Li <nong@databricks.com> Closes #11375 from nongli/spark-13499.

Diffstat (limited to 'project')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: