diff options
author | Reynold Xin <rxin@databricks.com> | 2016-04-08 23:52:04 -0700 |
---|---|---|
committer | Reynold Xin <rxin@databricks.com> | 2016-04-08 23:52:04 -0700 |
commit | 2f0b882e5c8787b09bedcc8208e6dcc5662dbbab (patch) | |
tree | aa17c6aa99fdbe772e51cdb40095a2cff492f754 /build | |
parent | d7af736b2cf6c392b87e7b45c2d2219ef06979eb (diff) | |
download | spark-2f0b882e5c8787b09bedcc8208e6dcc5662dbbab.tar.gz spark-2f0b882e5c8787b09bedcc8208e6dcc5662dbbab.tar.bz2 spark-2f0b882e5c8787b09bedcc8208e6dcc5662dbbab.zip |
[SPARK-14482][SQL] Change default Parquet codec from gzip to snappy
## What changes were proposed in this pull request?
Based on our tests, gzip decompression is very slow (< 100MB/s), making queries decompression bound. Snappy can decompress at ~ 500MB/s on a single core.
This patch changes the default compression codec for Parquet output from gzip to snappy, and also introduces a ParquetOptions class to be more consistent with other data sources (e.g. CSV, JSON).
## How was this patch tested?
Should be covered by existing unit tests.
Author: Reynold Xin <rxin@databricks.com>
Closes #12256 from rxin/SPARK-14482.
Diffstat (limited to 'build')
0 files changed, 0 insertions, 0 deletions