Likely invalid handling of the goaway frames
See original GitHub issueProblem description
We had suffer sporadically some issues with streams that got hanged forever, we think that issue could come because of the way that goaway
frames are Today managed by grpc-node
which seems to be different compared to other drivers like the Golang one [1]
In our scenario the streams are opened forever and we are expecting to get them recovered. We use heartbeats for having some defensive mechanism for TCP connections that are no longer usable - for any reason - and were not closed explicitly by the server.
From our understanding current code can not guarantee the usage of heartbeats in some scenarios, at least specifically during a goaway
event [2] which basically will immediately stop [3] the handlers that will make sure that if no heartbeats are seen in some time the session should be proactively closed.
This is in our opinion a problem, since during that window of time TCP connections that are not longer usable will become unprotected by the heartbeats, since the heartbeats wont kick in eventually for closing the connection at client side.
Further implication of this
There is another implication of this strategy that goes beyond to the proper bug. When a goaway
package is consumed the sub-channel connection is marked as IDLE
and this state is also trespassed to the getConnectivityState
reporting to the caller - the user that is using the API - that the status is IDLE
which from our understanding [4] it means that there are no outstanding RPCs, which is not true in case of a goaway
package where there is still at least one outstanding RPC.
[1] https://github.com/grpc/grpc-go/blob/master/internal/transport/http2_client.go#L1199 [2] https://github.com/grpc/grpc-node/blob/master/packages/grpc-js/src/subchannel.ts#L525 [3] https://github.com/grpc/grpc-node/blob/master/packages/grpc-js/src/subchannel.ts#L718 [4] https://github.com/grpc/grpc/blob/master/doc/connectivity-semantics-and-api.md
Reproduction steps
We reproduced this placing an Nginx
on front of a simple gRPC server, and using the keepalive_time
and grpc_read_timeout
directives for reproducing the folloiwng:
- When a new the
keepalive_time
- lets say configured to 120s - is reached, the next RPC would receive thegoaway
frame - After the RPC that got the
goaway
finishes,Nginx
closes proactively the TCP connection - We also configure the
grpc_read_timeout
configured to something like60s
for stopping proactively streams that did not have any traffic at all.
With Nginx configured and a gRPC stream server behind it, you can start running a gRPC stream client, and run the same stream one after the other in case of having it close it - which will happen because of the grpc_read_timeout
, eventually when the keepalive_time
kicks in the heart-beats will stop working, if by that time you add a rule - iptables, pf, etc - for dropping any response from the server the client, the client wont notice and the stream will hang forever.
Environment
- OS name, version and architecture: mac
- Node version v16.16.0
- Node installation method nvm
- Package name and version 1.6.8
Issue Analytics
- State:
- Created a year ago
- Comments:7 (3 by maintainers)
Top GitHub Comments
Reproducing needs to be done by triggering the
goaway
code path [1], I did that by putting an Nginx in front of a gRPC stream server (we use golang but any one would serve)Here the Nginx configuration that Ive used, where the
keeplive_time
does the trick for sending thegoway
frameOnce you have the Nginx and the gRPC backend server running its a matter of start hitting the gRPC port provided by Nginx using a gRCP client using the
grpc-js
package - latest version is fine - with heartbeats enabled, once started the client this will need to start stream and every time that it gets closed - because ofgrpc_read_timeout
- just retry it.You will notice that during the
goaway
event and the next RPC that will trigger a new TCP connection opened the heartbeats basically disappear.[1] https://github.com/grpc/grpc-node/blob/master/packages/grpc-js/src/subchannel.ts#L525
The Connectivity Semantics doc is not authoritative. It is out of date and is only used as a rough guideline for implementation. All current implementations switch to IDLE when receiving a GOAWAY, whether or not there are active RPCs.