diff options
author | Sital Kedia <skedia@fb.com> | 2016-08-04 14:54:38 -0700 |
---|---|---|
committer | Josh Rosen <joshrosen@databricks.com> | 2016-08-04 14:54:38 -0700 |
commit | 9c15d079df2418a1412269a702f3a7861daee61c (patch) | |
tree | 713b294ebc99767ad3143df8815bbe2014f2cd3e /docs/mllib-optimization.md | |
parent | 0e2e5d7d0b42226c61c3200fd63d2831c558519d (diff) | |
download | spark-9c15d079df2418a1412269a702f3a7861daee61c.tar.gz spark-9c15d079df2418a1412269a702f3a7861daee61c.tar.bz2 spark-9c15d079df2418a1412269a702f3a7861daee61c.zip |
[SPARK-15074][SHUFFLE] Cache shuffle index file to speedup shuffle fetch
## What changes were proposed in this pull request?
Shuffle fetch on large intermediate dataset is slow because the shuffle service open/close the index file for each shuffle fetch. This change introduces a cache for the index information so that we can avoid accessing the index files for each block fetch
## How was this patch tested?
Tested by running a job on the cluster and the shuffle read time was reduced by 50%.
Author: Sital Kedia <skedia@fb.com>
Closes #12944 from sitalkedia/shuffle_service.
Diffstat (limited to 'docs/mllib-optimization.md')
0 files changed, 0 insertions, 0 deletions