aboutsummaryrefslogtreecommitdiff
path: root/python
diff options
context:
space:
mode:
authorchutium <teng.qiu@gmail.com>2014-08-26 11:51:26 -0700
committerMichael Armbrust <michael@databricks.com>2014-08-26 11:51:26 -0700
commit8856c3d86009295be871989a5dc7270f31b420cd (patch)
treec7fa1259197668c91da4f67fe6fb7d5fb7e4f641 /python
parentb21ae5bbb9baa966f69303a30659aa8bbb2098da (diff)
downloadspark-8856c3d86009295be871989a5dc7270f31b420cd.tar.gz
spark-8856c3d86009295be871989a5dc7270f31b420cd.tar.bz2
spark-8856c3d86009295be871989a5dc7270f31b420cd.zip
[SPARK-3131][SQL] Allow user to set parquet compression codec for writing ParquetFile in SQLContext
There are 4 different compression codec available for ```ParquetOutputFormat``` in Spark SQL, it was set as a hard-coded value in ```ParquetRelation.defaultCompression``` original discuss: https://github.com/apache/spark/pull/195#discussion-diff-11002083 i added a new config property in SQLConf to allow user to change this compression codec, and i used similar short names syntax as described in SPARK-2953 #1873 (https://github.com/apache/spark/pull/1873/files#diff-0) btw, which codec should we use as default? it was set to GZIP (https://github.com/apache/spark/pull/195/files#diff-4), but i think maybe we should change this to SNAPPY, since SNAPPY is already the default codec for shuffling in spark-core (SPARK-2469, #1415), and parquet-mr supports Snappy codec natively (https://github.com/Parquet/parquet-mr/commit/e440108de57199c12d66801ca93804086e7f7632). Author: chutium <teng.qiu@gmail.com> Closes #2039 from chutium/parquet-compression and squashes the following commits: 2f44964 [chutium] [SPARK-3131][SQL] parquet compression default codec set to snappy, also in test suite e578e21 [chutium] [SPARK-3131][SQL] compression codec config property name and default codec set to snappy 21235dc [chutium] [SPARK-3131][SQL] Allow user to set parquet compression codec for writing ParquetFile in SQLContext
Diffstat (limited to 'python')
0 files changed, 0 insertions, 0 deletions