diff options
author | Davies Liu <davies@databricks.com> | 2016-04-04 14:41:03 -0700 |
---|---|---|
committer | Davies Liu <davies.liu@gmail.com> | 2016-04-04 14:41:03 -0700 |
commit | 400b2f863ffaa01a34a8dae1541c61526fef908b (patch) | |
tree | eb0773854538319d9534c2ebdb36a9eb65f513ae /project | |
parent | cc70f174169f45c85d459126a68bbe43c0bec328 (diff) | |
download | spark-400b2f863ffaa01a34a8dae1541c61526fef908b.tar.gz spark-400b2f863ffaa01a34a8dae1541c61526fef908b.tar.bz2 spark-400b2f863ffaa01a34a8dae1541c61526fef908b.zip |
[SPARK-14259] [SQL] Merging small files together based on the cost of opening
## What changes were proposed in this pull request?
This PR basically re-do the things in #12068 but with a different model, which should work better in case of small files with different sizes.
## How was this patch tested?
Updated existing tests.
Ran a query on thousands of partitioned small files locally, with all default settings (the cost to open a file should be over estimated), the durations of tasks become smaller and smaller, which is good (the last few tasks will be shortest).
Author: Davies Liu <davies@databricks.com>
Closes #12095 from davies/file_cost.
Diffstat (limited to 'project')
0 files changed, 0 insertions, 0 deletions