aboutsummaryrefslogtreecommitdiff
path: root/python
diff options
context:
space:
mode:
authorNong Li <nong@databricks.com>2016-04-09 17:45:10 -0700
committerDavies Liu <davies.liu@gmail.com>2016-04-09 17:45:10 -0700
commit5989c85b535f7f623392d6456d8b37052487f24b (patch)
tree48fa983954f6e631b06954ec4177c45c7fcd84a6 /python
parent5cb5edaf9c5054e42d41f20b2dd92dafcccbf0d6 (diff)
downloadspark-5989c85b535f7f623392d6456d8b37052487f24b.tar.gz
spark-5989c85b535f7f623392d6456d8b37052487f24b.tar.bz2
spark-5989c85b535f7f623392d6456d8b37052487f24b.zip
[SPARK-14217] [SQL] Fix bug if parquet data has columns that use dictionary encoding for some of the data
## What changes were proposed in this pull request? This PR is based on #12017 Currently, this causes batches where some values are dictionary encoded and some which are not. The non-dictionary encoded values cause us to remove the dictionary from the batch causing the first values to return garbage. This patch fixes the issue by first decoding the dictionary for the values that are already dictionary encoded before switching. A similar thing is done for the reverse case where the initial values are not dictionary encoded. ## How was this patch tested? This is difficult to test but replicated on a test cluster using a large tpcds data set. Author: Nong Li <nong@databricks.com> Author: Davies Liu <davies@databricks.com> Closes #12279 from davies/fix_dict.
Diffstat (limited to 'python')
0 files changed, 0 insertions, 0 deletions