diff options
author | Rene Treffer <treffer+github@measite.de> | 2015-07-27 23:29:40 +0800 |
---|---|---|
committer | Cheng Lian <lian@databricks.com> | 2015-07-27 23:29:40 +0800 |
commit | aa19c696e25ebb07fd3df110cfcbcc69954ce335 (patch) | |
tree | f8d93995a6b7c91a799fe6529578bdcdba0eaff1 /docs/graphx-programming-guide.md | |
parent | 622838165756e9669cbf7af13eccbc719638f40b (diff) | |
download | spark-aa19c696e25ebb07fd3df110cfcbcc69954ce335.tar.gz spark-aa19c696e25ebb07fd3df110cfcbcc69954ce335.tar.bz2 spark-aa19c696e25ebb07fd3df110cfcbcc69954ce335.zip |
[SPARK-4176] [SQL] Supports decimal types with precision > 18 in Parquet
This PR is based on #6796 authored by rtreffer.
To support large decimal precisions (> 18), we do the following things in this PR:
1. Making `CatalystSchemaConverter` support large decimal precision
Decimal types with large precision are always converted to fixed-length byte array.
2. Making `CatalystRowConverter` support reading decimal values with large precision
When the precision is > 18, constructs `Decimal` values with an unscaled `BigInteger` rather than an unscaled `Long`.
3. Making `RowWriteSupport` support writing decimal values with large precision
In this PR we always write decimals as fixed-length byte array, because Parquet write path hasn't been refactored to conform Parquet format spec (see SPARK-6774 & SPARK-8848).
Two follow-up tasks should be done in future PRs:
- [ ] Writing decimals as `INT32`, `INT64` when possible while fixing SPARK-8848
- [ ] Adding compatibility tests as part of SPARK-5463
Author: Cheng Lian <lian@databricks.com>
Closes #7455 from liancheng/spark-4176 and squashes the following commits:
a543d10 [Cheng Lian] Fixes errors introduced while rebasing
9e31cdf [Cheng Lian] Supports decimals with precision > 18 for Parquet
Diffstat (limited to 'docs/graphx-programming-guide.md')
0 files changed, 0 insertions, 0 deletions