Merge pull request #19 from aarondav/master-zk

Standalone Scheduler fault tolerance using ZooKeeper This patch implements full distributed fault tolerance for standalone scheduler Masters. There is only one master Leader at a time, which is actively serving scheduling requests. If this Leader crashes, another master will eventually be elected, reconstruct the state from the first Master, and continue serving scheduling requests. Leader election is performed using the ZooKeeper leader election pattern. We try to minimize the use of ZooKeeper and the assumptions about ZooKeeper's behavior, so there is a layer of retries and session monitoring on top of the ZooKeeper client. Master failover follows directly from the single-node Master recovery via the file system (patch d5a96fe), save that the Master state is stored in ZooKeeper instead. Configuration: By default, no recovery mechanism is enabled (spark.deploy.recoveryMode = NONE). By setting spark.deploy.recoveryMode to ZOOKEEPER and setting spark.deploy.zookeeper.url to an appropriate ZooKeeper URL, ZooKeeper recovery mode is enabled. By setting spark.deploy.recoveryMode to FILESYSTEM and setting spark.deploy.recoveryDirectory to an appropriate directory accessible by the Master, we will keep the behavior of from d5a96fe. Additionally, places where a Master could be specificied by a spark:// url can now take comma-delimited lists to specify backup masters. Note that this is only used for registration of NEW Workers and application Clients. Once a Worker or Client has registered with the Master Leader, it is "in the system" and will never need to register again.
author: Matei Zaharia <matei@eecs.berkeley.edu> 2013-10-10 17:16:42 -0700
committer: Matei Zaharia <matei@eecs.berkeley.edu> 2013-10-10 17:16:42 -0700
commit: c71499b7795564e1d16495c59273ecc027070fc5 (patch)
tree: 3476cb0d4836bbb25308bb8f65e6a1fbdeea2b1a /project
parent: cd08f73483658b872701ec1f74ce84933a45c6f0 (diff)
parent: 66c20635fa1fe18604bb4042ce31152180cb541d (diff)
download: spark-c71499b7795564e1d16495c59273ecc027070fc5.tar.gz
spark-c71499b7795564e1d16495c59273ecc027070fc5.tar.bz2
spark-c71499b7795564e1d16495c59273ecc027070fc5.zip
1 files changed, 1 insertions, 0 deletions
diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index eb4b96eb47..973f1e2f11 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -216,6 +216,7 @@ object SparkBuild extends Build {
       "net.java.dev.jets3t" % "jets3t" % "0.7.1",
       "org.apache.avro" % "avro" % "1.7.4",
       "org.apache.avro" % "avro-ipc" % "1.7.4" excludeAll(excludeNetty),
+      "org.apache.zookeeper" % "zookeeper" % "3.4.5" excludeAll(excludeNetty),
       "com.codahale.metrics" % "metrics-core" % "3.0.0",
       "com.codahale.metrics" % "metrics-jvm" % "3.0.0",
       "com.codahale.metrics" % "metrics-json" % "3.0.0",
author	Matei Zaharia <matei@eecs.berkeley.edu>	2013-10-10 17:16:42 -0700
committer	Matei Zaharia <matei@eecs.berkeley.edu>	2013-10-10 17:16:42 -0700
commit	c71499b7795564e1d16495c59273ecc027070fc5 (patch)
tree	3476cb0d4836bbb25308bb8f65e6a1fbdeea2b1a /project
parent	cd08f73483658b872701ec1f74ce84933a45c6f0 (diff)
parent	66c20635fa1fe18604bb4042ce31152180cb541d (diff)
download	spark-c71499b7795564e1d16495c59273ecc027070fc5.tar.gz spark-c71499b7795564e1d16495c59273ecc027070fc5.tar.bz2 spark-c71499b7795564e1d16495c59273ecc027070fc5.zip