diff options
author | Marcelo Vanzin <vanzin@cloudera.com> | 2015-11-25 12:58:18 -0800 |
---|---|---|
committer | Marcelo Vanzin <vanzin@cloudera.com> | 2015-11-25 12:58:18 -0800 |
commit | 4e81783e92f464d479baaf93eccc3adb1496989a (patch) | |
tree | 6ba31cd598671110d0e38f0930d36f358cd9b82d /network/yarn | |
parent | d29e2ef4cf43c7f7c5aa40d305cf02be44ce19e0 (diff) | |
download | spark-4e81783e92f464d479baaf93eccc3adb1496989a.tar.gz spark-4e81783e92f464d479baaf93eccc3adb1496989a.tar.bz2 spark-4e81783e92f464d479baaf93eccc3adb1496989a.zip |
[SPARK-11866][NETWORK][CORE] Make sure timed out RPCs are cleaned up.
This change does a couple of different things to make sure that the RpcEnv-level
code and the network library agree about the status of outstanding RPCs.
For RPCs that do not expect a reply ("RpcEnv.send"), support for one way
messages (hello CORBA!) was added to the network layer. This is a
"fire and forget" message that does not require any state to be kept
by the TransportClient; as a result, the RpcEnv 'Ack' message is not needed
anymore.
For RPCs that do expect a reply ("RpcEnv.ask"), the network library now
returns the internal RPC id; if the RpcEnv layer decides to time out the
RPC before the network layer does, it now asks the TransportClient to
forget about the RPC, so that if the network-level timeout occurs, the
client is not killed.
As part of implementing the above, I cleaned up some of the code in the
netty rpc backend, removing types that were not necessary and factoring
out some common code. Of interest is a slight change in the exceptions
when posting messages to a stopped RpcEnv; that's mostly to avoid nasty
error messages from the local-cluster backend when shutting down, which
pollutes the terminal output.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #9917 from vanzin/SPARK-11866.
Diffstat (limited to 'network/yarn')
0 files changed, 0 insertions, 0 deletions