SPARK-2282: Reuse Socket for sending accumulator updates to Pyspark - spark

diff options

author	Aaron Davidson <aaron@databricks.com>	2014-07-31 15:31:53 -0700
committer	Josh Rosen <joshrosen@apache.org>	2014-07-31 15:31:53 -0700
commit	ef4ff00f87a4e8d38866f163f01741c2673e41da (patch)
tree	e5674bacee57daaaca98201703197bee0c94a4b8 /sql
parent	492a195c5c4d68c85b8b1b48e3aa85165bbb5dc3 (diff)
download	spark-ef4ff00f87a4e8d38866f163f01741c2673e41da.tar.gz spark-ef4ff00f87a4e8d38866f163f01741c2673e41da.tar.bz2 spark-ef4ff00f87a4e8d38866f163f01741c2673e41da.zip

SPARK-2282: Reuse Socket for sending accumulator updates to Pyspark

Prior to this change, every PySpark task completion opened a new socket to the accumulator server, passed its updates through, and then quit. I'm not entirely sure why PySpark always sends accumulator updates, but regardless this causes a very rapid buildup of ephemeral TCP connections that remain in the TCP_WAIT state for around a minute before being cleaned up. Rather than trying to allow these sockets to be cleaned up faster, this patch simply reuses the connection between tasks completions (since they're fed updates in a single-threaded manner by the DAGScheduler anyway). The only tricky part here was making sure that the AccumulatorServer was able to shutdown in a timely manner (i.e., stop polling for new data), and this was accomplished via minor feats of magic. I have confirmed that this patch eliminates the buildup of ephemeral sockets due to the accumulator updates. However, I did note that there were still significant sockets being created against the PySpark daemon port, but my machine was not able to create enough sockets fast enough to fail. This may not be the last time we've seen this issue, though. Author: Aaron Davidson <aaron@databricks.com> Closes #1503 from aarondav/accum and squashes the following commits: b3e12f7 [Aaron Davidson] SPARK-2282: Reuse Socket for sending accumulator updates to Pyspark

Diffstat (limited to 'sql')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: