aboutsummaryrefslogtreecommitdiff
path: root/project
diff options
context:
space:
mode:
authorReynold Xin <rxin@databricks.com>2016-04-08 23:52:04 -0700
committerReynold Xin <rxin@databricks.com>2016-04-08 23:52:04 -0700
commit2f0b882e5c8787b09bedcc8208e6dcc5662dbbab (patch)
treeaa17c6aa99fdbe772e51cdb40095a2cff492f754 /project
parentd7af736b2cf6c392b87e7b45c2d2219ef06979eb (diff)
downloadspark-2f0b882e5c8787b09bedcc8208e6dcc5662dbbab.tar.gz
spark-2f0b882e5c8787b09bedcc8208e6dcc5662dbbab.tar.bz2
spark-2f0b882e5c8787b09bedcc8208e6dcc5662dbbab.zip
[SPARK-14482][SQL] Change default Parquet codec from gzip to snappy
## What changes were proposed in this pull request? Based on our tests, gzip decompression is very slow (< 100MB/s), making queries decompression bound. Snappy can decompress at ~ 500MB/s on a single core. This patch changes the default compression codec for Parquet output from gzip to snappy, and also introduces a ParquetOptions class to be more consistent with other data sources (e.g. CSV, JSON). ## How was this patch tested? Should be covered by existing unit tests. Author: Reynold Xin <rxin@databricks.com> Closes #12256 from rxin/SPARK-14482.
Diffstat (limited to 'project')
0 files changed, 0 insertions, 0 deletions