aboutsummaryrefslogtreecommitdiff
path: root/core/src
diff options
context:
space:
mode:
authorAaron Davidson <aaron@databricks.com>2014-10-16 18:58:18 -0700
committerAndrew Or <andrewor14@gmail.com>2014-10-16 18:58:18 -0700
commit7f7b50ed9d4ffdd6b23e0faa56b068a049da67f7 (patch)
tree2539de82598b0a170c041086edc606a56ed71eb5 /core/src
parent2fe0ba95616bb3860736b6b426635a5d2a0e9bd9 (diff)
downloadspark-7f7b50ed9d4ffdd6b23e0faa56b068a049da67f7.tar.gz
spark-7f7b50ed9d4ffdd6b23e0faa56b068a049da67f7.tar.bz2
spark-7f7b50ed9d4ffdd6b23e0faa56b068a049da67f7.zip
[SPARK-3923] Increase Akka heartbeat pause above heartbeat interval
Something about the 2.3.4 upgrade seems to have made the issue manifest where all the services disconnect from each other after exactly 1000 seconds (which is the heartbeat interval). [This post](https://groups.google.com/forum/#!topic/akka-user/X3xzpTCbEFs) suggests that heartbeat pause should be greater than heartbeat interval, and increasing the pause from 600s to 6000s seems to have rectified the issue. My current cluster has now exceeded 1400s of uptime without failure! I do not know why this fixed it, because the threshold we have set for the failure detector is the exponent of a timeout, and 300 is extremely large. Perhaps the default failure detector changed in 2.3.4 and now ignores threshold. Author: Aaron Davidson <aaron@databricks.com> Closes #2784 from aarondav/fix-timeout and squashes the following commits: bd1151a [Aaron Davidson] Increase pause, don't decrease interval 9cb0372 [Aaron Davidson] [SPARK-3923] Decrease Akka heartbeat interval below heartbeat pause
Diffstat (limited to 'core/src')
-rw-r--r--core/src/main/scala/org/apache/spark/util/AkkaUtils.scala2
1 files changed, 1 insertions, 1 deletions
diff --git a/core/src/main/scala/org/apache/spark/util/AkkaUtils.scala b/core/src/main/scala/org/apache/spark/util/AkkaUtils.scala
index e2d32c859b..f41c8d0315 100644
--- a/core/src/main/scala/org/apache/spark/util/AkkaUtils.scala
+++ b/core/src/main/scala/org/apache/spark/util/AkkaUtils.scala
@@ -77,7 +77,7 @@ private[spark] object AkkaUtils extends Logging {
val logAkkaConfig = if (conf.getBoolean("spark.akka.logAkkaConfig", false)) "on" else "off"
- val akkaHeartBeatPauses = conf.getInt("spark.akka.heartbeat.pauses", 600)
+ val akkaHeartBeatPauses = conf.getInt("spark.akka.heartbeat.pauses", 6000)
val akkaFailureDetector =
conf.getDouble("spark.akka.failure-detector.threshold", 300.0)
val akkaHeartBeatInterval = conf.getInt("spark.akka.heartbeat.interval", 1000)