aboutsummaryrefslogtreecommitdiff
path: root/common
diff options
context:
space:
mode:
authorKay Ousterhout <kayousterhout@gmail.com>2016-06-17 12:12:46 -0700
committerKay Ousterhout <kayousterhout@gmail.com>2016-06-17 12:12:46 -0700
commitc8809db5a5ae4111e193907ac35929a906ddac3e (patch)
tree2f86290f32fcbb5313962c1f196af178ebb4d57d /common
parent1f0a46958ef51a01560ada23665dccde89696e12 (diff)
downloadspark-c8809db5a5ae4111e193907ac35929a906ddac3e.tar.gz
spark-c8809db5a5ae4111e193907ac35929a906ddac3e.tar.bz2
spark-c8809db5a5ae4111e193907ac35929a906ddac3e.zip
[SPARK-15926] Improve readability of DAGScheduler stage creation methods
## What changes were proposed in this pull request? This pull request refactors parts of the DAGScheduler to improve readability, focusing on the code around stage creation. One goal of this change it to make it clearer which functions may create new stages (as opposed to looking up stages that already exist). There are no functionality changes in this pull request. In more detail: * shuffleToMapStage was renamed to shuffleIdToMapStage (when reading the existing code I have sometimes struggled to remember what the key is -- is it a stage? A stage id? This change is intended to avoid that confusion) * Cleaned up the code to create shuffle map stages. Previously, creating a shuffle map stage involved 3 different functions (newOrUsedShuffleStage, newShuffleMapStage, and getShuffleMapStage), and it wasn't clear what the purpose of each function was. With the new code, a single function (getOrCreateShuffleMapStage) is responsible for getting a stage (if it already exists) or creating new shuffle map stages and any missing ancestor stages, and it delegates to createShuffleMapStage when new stages need to be created. There's some remaining confusion here because the getOrCreateParentStages call in createShuffleMapStage may recursively create ancestor stages; this is an issue I plan to fix in a future pull request, because it's trickier to fix and involves a slight functionality change. * newResultStage was renamed to createResultStage, for consistency with naming around shuffle map stages. * getParentStages has been renamed to getOrCreateParentStages, to make it clear that this function will sometimes create missing ancestor stages. * The only *slight* functionality change is that on line 478, updateJobIdStageIdMaps now uses a stage's parents instance variable rather than re-calculating them (I couldn't see any reason why they'd need to be re-calculated, and suspect this is just leftover from older code). * getAncestorShuffleDependencies was renamed to getMissingAncestorShuffleDependencies, to make it clear that this only returns dependencies that have not yet been run. cc squito markhamstra JoshRosen (who requested more DAG scheduler commenting long ago -- an issue this pull request tries, in part, to address) FYI rxin Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #13677 from kayousterhout/SPARK-15926.
Diffstat (limited to 'common')
0 files changed, 0 insertions, 0 deletions