aboutsummaryrefslogtreecommitdiff
path: root/project/SparkBuild.scala
diff options
context:
space:
mode:
authorAaron Davidson <aaron@databricks.com>2013-09-17 09:40:06 -0700
committerAaron Davidson <aaron@databricks.com>2013-09-26 14:59:35 -0700
commitd5a96feccb15dd290b282af9e2f94479c8e4554e (patch)
tree55146010e613178553ff6fd1bc35e5d4d53addcf /project/SparkBuild.scala
parent13eced723f222095ea4b52c4f6cb078cae66342e (diff)
downloadspark-d5a96feccb15dd290b282af9e2f94479c8e4554e.tar.gz
spark-d5a96feccb15dd290b282af9e2f94479c8e4554e.tar.bz2
spark-d5a96feccb15dd290b282af9e2f94479c8e4554e.zip
Standalone Scheduler fault recovery
Implements a basic form of Standalone Scheduler fault recovery. In particular, this allows faults to be manually recovered from by means of restarting the Master process on the same machine. This is the majority of the code necessary for general fault tolerance, which will first elect a leader and then recover the Master state. In order to enable fault recovery, the Master will persist a small amount of state related to the registration of Workers and Applications to disk. If the Master is started and sees that this state is still around, it will enter Recovery mode, during which time it will not schedule any new Executors on Workers (but it does accept the registration of new Clients and Workers). At this point, the Master attempts to reconnect to all Workers and Client applications that were registered at the time of failure. After confirming either the existence or nonexistence of all such nodes (within a certain timeout), the Master will exit Recovery mode and resume normal scheduling.
Diffstat (limited to 'project/SparkBuild.scala')
0 files changed, 0 insertions, 0 deletions