diff options
author | Shivaram Venkataraman <shivaram@cs.berkeley.edu> | 2015-06-10 15:03:40 -0700 |
---|---|---|
committer | Kay Ousterhout <kayousterhout@gmail.com> | 2015-06-10 15:04:38 -0700 |
commit | 96a7c888d806adfdb2c722025a1079ed7eaa2052 (patch) | |
tree | 95837a4607231ea5603fe947926ba2f67fa59a52 /mllib | |
parent | 5014d0ed7e2f69810654003f8dd38078b945cf05 (diff) | |
download | spark-96a7c888d806adfdb2c722025a1079ed7eaa2052.tar.gz spark-96a7c888d806adfdb2c722025a1079ed7eaa2052.tar.bz2 spark-96a7c888d806adfdb2c722025a1079ed7eaa2052.zip |
[SPARK-2774] Set preferred locations for reduce tasks
Set preferred locations for reduce tasks.
The basic design is that we maintain a map from reducerId to a list of (sizes, locations) for each
shuffle. We then set the preferred locations to be any machines that have 20% of more of the output
that needs to be read by the reduce task. This will result in at most 5 preferred locations for
each reduce task.
Selecting the preferred locations involves O(# map tasks * # reduce tasks) computation, so we
restrict this feature to cases where we have fewer than 1000 map tasks and 1000 reduce tasks.
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Closes #6652 from shivaram/reduce-locations and squashes the following commits:
492e25e [Shivaram Venkataraman] Remove unused import
2ef2d39 [Shivaram Venkataraman] Address code review comments
897a914 [Shivaram Venkataraman] Remove unused hash map
f5be578 [Shivaram Venkataraman] Use fraction of map outputs to determine locations Also removes caching of preferred locations to make the API cleaner
68bc29e [Shivaram Venkataraman] Fix line length
1090b58 [Shivaram Venkataraman] Change flag name
77ce7d8 [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into reduce-locations
e5d56bd [Shivaram Venkataraman] Add flag to turn off locality for shuffle deps
6cfae98 [Shivaram Venkataraman] Filter out zero blocks, rename variables
9d5831a [Shivaram Venkataraman] Address some more comments
8e31266 [Shivaram Venkataraman] Fix style
0df3180 [Shivaram Venkataraman] Address code review comments
e7d5449 [Shivaram Venkataraman] Fix merge issues
ad7cb53 [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into reduce-locations
df14cee [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into reduce-locations
5093aea [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into reduce-locations
0171d3c [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into reduce-locations
bc4dfd6 [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into reduce-locations
774751b [Shivaram Venkataraman] Fix bug introduced by line length adjustment
34d0283 [Shivaram Venkataraman] Fix style issues
3b464b7 [Shivaram Venkataraman] Set preferred locations for reduce tasks This is another attempt at #1697 addressing some of the earlier concerns. This adds a couple of thresholds based on number map and reduce tasks beyond which we don't use preferred locations for reduce tasks.
Diffstat (limited to 'mllib')
0 files changed, 0 insertions, 0 deletions