[SPARK-14482][SQL] Change default Parquet codec from gzip to snappy - spark

diff options

author	Reynold Xin <rxin@databricks.com>	2016-04-08 23:52:04 -0700
committer	Reynold Xin <rxin@databricks.com>	2016-04-08 23:52:04 -0700
commit	2f0b882e5c8787b09bedcc8208e6dcc5662dbbab (patch)
tree	aa17c6aa99fdbe772e51cdb40095a2cff492f754 /project
parent	d7af736b2cf6c392b87e7b45c2d2219ef06979eb (diff)
download	spark-2f0b882e5c8787b09bedcc8208e6dcc5662dbbab.tar.gz spark-2f0b882e5c8787b09bedcc8208e6dcc5662dbbab.tar.bz2 spark-2f0b882e5c8787b09bedcc8208e6dcc5662dbbab.zip

[SPARK-14482][SQL] Change default Parquet codec from gzip to snappy

## What changes were proposed in this pull request? Based on our tests, gzip decompression is very slow (< 100MB/s), making queries decompression bound. Snappy can decompress at ~ 500MB/s on a single core. This patch changes the default compression codec for Parquet output from gzip to snappy, and also introduces a ParquetOptions class to be more consistent with other data sources (e.g. CSV, JSON). ## How was this patch tested? Should be covered by existing unit tests. Author: Reynold Xin <rxin@databricks.com> Closes #12256 from rxin/SPARK-14482.

Diffstat (limited to 'project')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: