[SPARK-15074][SHUFFLE] Cache shuffle index file to speedup shuffle fetch

## What changes were proposed in this pull request? Shuffle fetch on large intermediate dataset is slow because the shuffle service open/close the index file for each shuffle fetch. This change introduces a cache for the index information so that we can avoid accessing the index files for each block fetch ## How was this patch tested? Tested by running a job on the cluster and the shuffle read time was reduced by 50%. Author: Sital Kedia <skedia@fb.com> Closes #12944 from sitalkedia/shuffle_service.
author: Sital Kedia <skedia@fb.com> 2016-08-04 14:54:38 -0700
committer: Josh Rosen <joshrosen@databricks.com> 2016-08-04 14:54:38 -0700
commit: 9c15d079df2418a1412269a702f3a7861daee61c (patch)
tree: 713b294ebc99767ad3143df8815bbe2014f2cd3e /docs/configuration.md
parent: 0e2e5d7d0b42226c61c3200fd63d2831c558519d (diff)
download: spark-9c15d079df2418a1412269a702f3a7861daee61c.tar.gz
spark-9c15d079df2418a1412269a702f3a7861daee61c.tar.bz2
spark-9c15d079df2418a1412269a702f3a7861daee61c.zip
1 files changed, 7 insertions, 0 deletions
diff --git a/docs/configuration.md b/docs/configuration.md
index bf10b24819..cc6b2b6470 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -522,6 +522,13 @@ Apart from these, the following properties are also available, and may be useful
   </td>
 </tr>
 <tr>
+  <td><code>spark.shuffle.service.index.cache.entries</code></td>
+  <td>1024</td>
+  <td>
+    Max number of entries to keep in the index cache of the shuffle service.
+  </td>
+</tr>
+<tr>
   <td><code>spark.shuffle.sort.bypassMergeThreshold</code></td>
   <td>200</td>
   <td>
author	Sital Kedia <skedia@fb.com>	2016-08-04 14:54:38 -0700
committer	Josh Rosen <joshrosen@databricks.com>	2016-08-04 14:54:38 -0700
commit	9c15d079df2418a1412269a702f3a7861daee61c (patch)
tree	713b294ebc99767ad3143df8815bbe2014f2cd3e /docs/configuration.md
parent	0e2e5d7d0b42226c61c3200fd63d2831c558519d (diff)
download	spark-9c15d079df2418a1412269a702f3a7861daee61c.tar.gz spark-9c15d079df2418a1412269a702f3a7861daee61c.tar.bz2 spark-9c15d079df2418a1412269a702f3a7861daee61c.zip