diff options
author | chutium <teng.qiu@gmail.com> | 2014-08-26 11:51:26 -0700 |
---|---|---|
committer | Michael Armbrust <michael@databricks.com> | 2014-08-26 11:51:26 -0700 |
commit | 8856c3d86009295be871989a5dc7270f31b420cd (patch) | |
tree | c7fa1259197668c91da4f67fe6fb7d5fb7e4f641 /docker | |
parent | b21ae5bbb9baa966f69303a30659aa8bbb2098da (diff) | |
download | spark-8856c3d86009295be871989a5dc7270f31b420cd.tar.gz spark-8856c3d86009295be871989a5dc7270f31b420cd.tar.bz2 spark-8856c3d86009295be871989a5dc7270f31b420cd.zip |
[SPARK-3131][SQL] Allow user to set parquet compression codec for writing ParquetFile in SQLContext
There are 4 different compression codec available for ```ParquetOutputFormat```
in Spark SQL, it was set as a hard-coded value in ```ParquetRelation.defaultCompression```
original discuss:
https://github.com/apache/spark/pull/195#discussion-diff-11002083
i added a new config property in SQLConf to allow user to change this compression codec, and i used similar short names syntax as described in SPARK-2953 #1873 (https://github.com/apache/spark/pull/1873/files#diff-0)
btw, which codec should we use as default? it was set to GZIP (https://github.com/apache/spark/pull/195/files#diff-4), but i think maybe we should change this to SNAPPY, since SNAPPY is already the default codec for shuffling in spark-core (SPARK-2469, #1415), and parquet-mr supports Snappy codec natively (https://github.com/Parquet/parquet-mr/commit/e440108de57199c12d66801ca93804086e7f7632).
Author: chutium <teng.qiu@gmail.com>
Closes #2039 from chutium/parquet-compression and squashes the following commits:
2f44964 [chutium] [SPARK-3131][SQL] parquet compression default codec set to snappy, also in test suite
e578e21 [chutium] [SPARK-3131][SQL] compression codec config property name and default codec set to snappy
21235dc [chutium] [SPARK-3131][SQL] Allow user to set parquet compression codec for writing ParquetFile in SQLContext
Diffstat (limited to 'docker')
0 files changed, 0 insertions, 0 deletions