[SPARK-15736][CORE] Gracefully handle loss of DiskStore files - spark

diff options

author	Josh Rosen <joshrosen@databricks.com>	2016-06-02 17:36:31 -0700
committer	Andrew Or <andrew@databricks.com>	2016-06-02 17:36:31 -0700
commit	229f90225748343972d7202c5567b45364cd8497 (patch)
tree	fe77b0b2ecbfae61ac8fde4ad5bdb4b7de47a579 /sql/hive
parent	5855e0057defeab8006ca4f7b0196003bbc9e899 (diff)
download	spark-229f90225748343972d7202c5567b45364cd8497.tar.gz spark-229f90225748343972d7202c5567b45364cd8497.tar.bz2 spark-229f90225748343972d7202c5567b45364cd8497.zip

[SPARK-15736][CORE] Gracefully handle loss of DiskStore files

If an RDD partition is cached on disk and the DiskStore file is lost, then reads of that cached partition will fail and the missing partition is supposed to be recomputed by a new task attempt. In the current BlockManager implementation, however, the missing file does not trigger any metadata updates / does not invalidate the cache, so subsequent task attempts will be scheduled on the same executor and the doomed read will be repeatedly retried, leading to repeated task failures and eventually a total job failure. In order to fix this problem, the executor with the missing file needs to properly mark the corresponding block as missing so that it stops advertising itself as a cache location for that block. This patch fixes this bug and adds an end-to-end regression test (in `FailureSuite`) and a set of unit tests (`in BlockManagerSuite`). Author: Josh Rosen <joshrosen@databricks.com> Closes #13473 from JoshRosen/handle-missing-cache-files.

Diffstat (limited to 'sql/hive')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: