[SPARK-13582] [SQL] defer dictionary decoding in parquet reader - spark

diff options

author	Davies Liu <davies@databricks.com>	2016-03-01 13:07:04 -0800
committer	Davies Liu <davies.liu@gmail.com>	2016-03-01 13:07:04 -0800
commit	c27ba0d547a0cd3fd00bb42c76ad971b2d48b4a0 (patch)
tree	f529168194ef53ded5cda96b0353f62fcd9bcad7 /.rat-excludes
parent	c37bbb3a1cbd93c749aaaeca1345817e0c20094f (diff)
download	spark-c27ba0d547a0cd3fd00bb42c76ad971b2d48b4a0.tar.gz spark-c27ba0d547a0cd3fd00bb42c76ad971b2d48b4a0.tar.bz2 spark-c27ba0d547a0cd3fd00bb42c76ad971b2d48b4a0.zip

[SPARK-13582] [SQL] defer dictionary decoding in parquet reader

## What changes were proposed in this pull request? This PR defer the resolution from a id of dictionary to value until the column is actually accessed (inside getInt/getLong), this is very useful for those columns and rows that are filtered out. It's also useful for binary type, we will not need to copy all the byte arrays. This PR also change the underlying type for small decimal that could be fit within a Int, in order to use getInt() to lookup the value from IntDictionary. ## How was this patch tested? Manually test TPCDS Q7 with scale factor 10, saw about 30% improvements (after PR #11274). Author: Davies Liu <davies@databricks.com> Closes #11437 from davies/decode_dict.

Diffstat (limited to '.rat-excludes')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: