[SPARK-3131][SQL] Allow user to set parquet compression codec for writing ParquetFile in SQLContext - spark

diff options

author	chutium <teng.qiu@gmail.com>	2014-08-26 11:51:26 -0700
committer	Michael Armbrust <michael@databricks.com>	2014-08-26 11:51:26 -0700
commit	8856c3d86009295be871989a5dc7270f31b420cd (patch)
tree	c7fa1259197668c91da4f67fe6fb7d5fb7e4f641 /python
parent	b21ae5bbb9baa966f69303a30659aa8bbb2098da (diff)
download	spark-8856c3d86009295be871989a5dc7270f31b420cd.tar.gz spark-8856c3d86009295be871989a5dc7270f31b420cd.tar.bz2 spark-8856c3d86009295be871989a5dc7270f31b420cd.zip

[SPARK-3131][SQL] Allow user to set parquet compression codec for writing ParquetFile in SQLContext

There are 4 different compression codec available for ```ParquetOutputFormat``` in Spark SQL, it was set as a hard-coded value in ```ParquetRelation.defaultCompression``` original discuss: https://github.com/apache/spark/pull/195#discussion-diff-11002083 i added a new config property in SQLConf to allow user to change this compression codec, and i used similar short names syntax as described in SPARK-2953 #1873 (https://github.com/apache/spark/pull/1873/files#diff-0) btw, which codec should we use as default? it was set to GZIP (https://github.com/apache/spark/pull/195/files#diff-4), but i think maybe we should change this to SNAPPY, since SNAPPY is already the default codec for shuffling in spark-core (SPARK-2469, #1415), and parquet-mr supports Snappy codec natively (https://github.com/Parquet/parquet-mr/commit/e440108de57199c12d66801ca93804086e7f7632). Author: chutium <teng.qiu@gmail.com> Closes #2039 from chutium/parquet-compression and squashes the following commits: 2f44964 [chutium] [SPARK-3131][SQL] parquet compression default codec set to snappy, also in test suite e578e21 [chutium] [SPARK-3131][SQL] compression codec config property name and default codec set to snappy 21235dc [chutium] [SPARK-3131][SQL] Allow user to set parquet compression codec for writing ParquetFile in SQLContext

Diffstat (limited to 'python')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: