aboutsummaryrefslogtreecommitdiff
path: root/python
diff options
context:
space:
mode:
authorchutium <teng.qiu@gmail.com>2014-08-26 11:51:26 -0700
committerMichael Armbrust <michael@databricks.com>2014-08-26 11:51:42 -0700
commit3a9d874d7a46ab8b015631d91ba479d9a0ba827f (patch)
treea17264a34f24ca20327289d5158448dc3b01e335 /python
parent0f947f1239831a6ed3b47af65816715999bbe57b (diff)
downloadspark-3a9d874d7a46ab8b015631d91ba479d9a0ba827f.tar.gz
spark-3a9d874d7a46ab8b015631d91ba479d9a0ba827f.tar.bz2
spark-3a9d874d7a46ab8b015631d91ba479d9a0ba827f.zip
[SPARK-3131][SQL] Allow user to set parquet compression codec for writing ParquetFile in SQLContext
There are 4 different compression codec available for ```ParquetOutputFormat``` in Spark SQL, it was set as a hard-coded value in ```ParquetRelation.defaultCompression``` original discuss: https://github.com/apache/spark/pull/195#discussion-diff-11002083 i added a new config property in SQLConf to allow user to change this compression codec, and i used similar short names syntax as described in SPARK-2953 #1873 (https://github.com/apache/spark/pull/1873/files#diff-0) btw, which codec should we use as default? it was set to GZIP (https://github.com/apache/spark/pull/195/files#diff-4), but i think maybe we should change this to SNAPPY, since SNAPPY is already the default codec for shuffling in spark-core (SPARK-2469, #1415), and parquet-mr supports Snappy codec natively (https://github.com/Parquet/parquet-mr/commit/e440108de57199c12d66801ca93804086e7f7632). Author: chutium <teng.qiu@gmail.com> Closes #2039 from chutium/parquet-compression and squashes the following commits: 2f44964 [chutium] [SPARK-3131][SQL] parquet compression default codec set to snappy, also in test suite e578e21 [chutium] [SPARK-3131][SQL] compression codec config property name and default codec set to snappy 21235dc [chutium] [SPARK-3131][SQL] Allow user to set parquet compression codec for writing ParquetFile in SQLContext (cherry picked from commit 8856c3d86009295be871989a5dc7270f31b420cd) Signed-off-by: Michael Armbrust <michael@databricks.com>
Diffstat (limited to 'python')
0 files changed, 0 insertions, 0 deletions