aboutsummaryrefslogtreecommitdiff
path: root/graphx/pom.xml
diff options
context:
space:
mode:
authorJosh Rosen <joshrosen@databricks.com>2015-02-17 17:39:58 -0800
committerPatrick Wendell <patrick@databricks.com>2015-02-17 17:40:04 -0800
commit07a401a7beea864092ec8f8c451e05cba5a19bbb (patch)
treeef895fc6392dedd4adcf2551d5fc2f31487960ed /graphx/pom.xml
parentcb905841b2eaa19e28a1327cab0e5d51f805d233 (diff)
downloadspark-07a401a7beea864092ec8f8c451e05cba5a19bbb.tar.gz
spark-07a401a7beea864092ec8f8c451e05cba5a19bbb.tar.bz2
spark-07a401a7beea864092ec8f8c451e05cba5a19bbb.zip
[SPARK-4454] Properly synchronize accesses to DAGScheduler cacheLocs map
This patch addresses a race condition in DAGScheduler by properly synchronizing accesses to its `cacheLocs` map. This map is accessed by the `getCacheLocs` and `clearCacheLocs()` methods, which can be called by separate threads, since DAGScheduler's `getPreferredLocs()` method is called by SparkContext and indirectly calls `getCacheLocs()`. If this map is cleared by the DAGScheduler event processing thread while a user thread is submitting a job and computing preferred locations, then this can cause the user thread to throw "NoSuchElementException: key not found" errors. Most accesses to DAGScheduler's internal state do not need synchronization because that state is only accessed from the event processing loop's thread. An alternative approach to fixing this bug would be to refactor this code so that SparkContext sends the DAGScheduler a message in order to get the list of preferred locations. However, this would involve more extensive changes to this code and would be significantly harder to backport to maintenance branches since some of the related code has undergone significant refactoring (e.g. the introduction of EventLoop). Since `cacheLocs` is the only state that's accessed in this way, adding simple synchronization seems like a better short-term fix. See #3345 for additional context. Author: Josh Rosen <joshrosen@databricks.com> Closes #4660 from JoshRosen/SPARK-4454 and squashes the following commits: 12d64ba [Josh Rosen] Properly synchronize accesses to DAGScheduler cacheLocs map. (cherry picked from commit d46d6246d225ff3af09ebae1a09d4de2430c502d) Signed-off-by: Patrick Wendell <patrick@databricks.com>
Diffstat (limited to 'graphx/pom.xml')
0 files changed, 0 insertions, 0 deletions