aboutsummaryrefslogtreecommitdiff
path: root/python/pyspark
diff options
context:
space:
mode:
authormcheah <mcheah@palantir.com>2014-10-20 11:35:18 -0700
committerJosh Rosen <joshrosen@databricks.com>2014-10-20 11:35:18 -0700
commit4afe9a4852ebeb4cc77322a14225cd3dec165f3f (patch)
treedde032f9763a07cff5ca59b92730b8cc7def4e8a /python/pyspark
parentea054e1fc70e09e0babcdae2a37f6f7aa6a035f2 (diff)
downloadspark-4afe9a4852ebeb4cc77322a14225cd3dec165f3f.tar.gz
spark-4afe9a4852ebeb4cc77322a14225cd3dec165f3f.tar.bz2
spark-4afe9a4852ebeb4cc77322a14225cd3dec165f3f.zip
[SPARK-3736] Workers reconnect when disassociated from the master.
Before, if the master node is killed and restarted, the worker nodes would not attempt to reconnect to the Master. Therefore, when the Master node was restarted, the worker nodes needed to be restarted as well. Now, when the Master node is disconnected, the worker nodes will continuously ping the master node in attempts to reconnect to it. Once the master node restarts, it will detect one of the registration requests from its former workers. The result is that the cluster re-enters a healthy state. In addition, when the master does not receive a heartbeat from the worker, the worker was removed; however, when the worker sent a heartbeat to the master, the master used to ignore the heartbeat. Now, a master that receives a heartbeat from a worker that had been disconnected will request the worker to re-attempt the registration process, at which point the worker will send a RegisterWorker request and be re-connected accordingly. Re-connection attempts per worker are submitted every N seconds, where N is configured by the property spark.worker.reconnect.interval - this has a default of 60 seconds right now. Author: mcheah <mcheah@palantir.com> Closes #2828 from mccheah/reconnect-dead-workers and squashes the following commits: 83f8bc9 [mcheah] [SPARK-3736] More informative log message, and fixing some indentation. fe0e02f [mcheah] [SPARK-3736] Moving reconnection logic to registerWithMaster(). 94ddeca [mcheah] [SPARK-3736] Changing a log warning to a log info. a698e35 [mcheah] [SPARK-3736] Addressing PR comment to make some defs private. b9a3077 [mcheah] [SPARK-3736] Addressing PR comments related to reconnection. 2ad5ed5 [mcheah] [SPARK-3736] Cancel attempts to reconnect if the master changes. b5b34af [mcheah] [SPARK-3736] Workers reconnect when disassociated from the master.
Diffstat (limited to 'python/pyspark')
0 files changed, 0 insertions, 0 deletions