gRPC client try to reconnect subchannel about 20 Seconds after I make the gRPC server network unreachable
See original GitHub issuePlease answer these questions before submitting your issue.
What version of gRPC are you using?
gRPC1.5.0
What JVM are you using (java -version
)?
JDK1.8.0
What did you do?
If possible, provide a recipe for reproducing the error. 1、I create gRPC client and gRPC server in different host, 2、Send message to server interval of one second 3、Disable network to make the network from client to server unreachable.(unplug the server‘s cable)
What did you expect to see?
I want to see the client can quickly respond to the service unavailable as far as possible, rather than wait for about 20 seconds to know that the server is unavailable.
What did you see instead?
I see that the NettyClientHandler will do the onConnectionError after about 20 seconds, what Throwable it catch is
Status{code=UNAVAILABLE, description=null, cause=java.io.IOException: The remote host forced an existing connection to be closed
Then ManagedChannelImpl will handle the handleSubchannelState, make the subchannel to IDLE and try to reconnect. But I don’t understand What’s the mechanism? How does the client feel disconnected from the server network. And is there some methods that the client is able to perceive the server unavailable state more quickly!
Issue Analytics
- State:
- Created 6 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
The ~20 second timeout is likely coming from the socket timing out, which is OS-dependent. It’s possible in other environments to wait much longer when the network is dropped before the client realizes it can’t communicate with the server. You can use gRPC’s keep-alives to get a more reliable timeout here, see https://github.com/grpc/grpc-java/blob/master/netty/src/main/java/io/grpc/netty/NettyChannelBuilder.java#L278. The current minimum keep-alive interval is 10 seconds, see https://github.com/grpc/grpc-java/blob/master/core/src/main/java/io/grpc/internal/KeepAliveManager.java#L231.
getState(true)
will trigger a connection without an RPC. If the connection fails, then the Channel state will become TRANSIENT_FAILURE.