aboutsummaryrefslogtreecommitdiff
path: root/pom.xml
diff options
context:
space:
mode:
authorDavies Liu <davies@databricks.com>2016-04-04 14:41:03 -0700
committerDavies Liu <davies.liu@gmail.com>2016-04-04 14:41:03 -0700
commit400b2f863ffaa01a34a8dae1541c61526fef908b (patch)
treeeb0773854538319d9534c2ebdb36a9eb65f513ae /pom.xml
parentcc70f174169f45c85d459126a68bbe43c0bec328 (diff)
downloadspark-400b2f863ffaa01a34a8dae1541c61526fef908b.tar.gz
spark-400b2f863ffaa01a34a8dae1541c61526fef908b.tar.bz2
spark-400b2f863ffaa01a34a8dae1541c61526fef908b.zip
[SPARK-14259] [SQL] Merging small files together based on the cost of opening
## What changes were proposed in this pull request? This PR basically re-do the things in #12068 but with a different model, which should work better in case of small files with different sizes. ## How was this patch tested? Updated existing tests. Ran a query on thousands of partitioned small files locally, with all default settings (the cost to open a file should be over estimated), the durations of tasks become smaller and smaller, which is good (the last few tasks will be shortest). Author: Davies Liu <davies@databricks.com> Closes #12095 from davies/file_cost.
Diffstat (limited to 'pom.xml')
0 files changed, 0 insertions, 0 deletions