question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Likely invalid handling of the goaway frames

See original GitHub issue

Problem description

We had suffer sporadically some issues with streams that got hanged forever, we think that issue could come because of the way that goaway frames are Today managed by grpc-node which seems to be different compared to other drivers like the Golang one [1]

In our scenario the streams are opened forever and we are expecting to get them recovered. We use heartbeats for having some defensive mechanism for TCP connections that are no longer usable - for any reason - and were not closed explicitly by the server.

From our understanding current code can not guarantee the usage of heartbeats in some scenarios, at least specifically during a goaway event [2] which basically will immediately stop [3] the handlers that will make sure that if no heartbeats are seen in some time the session should be proactively closed.

This is in our opinion a problem, since during that window of time TCP connections that are not longer usable will become unprotected by the heartbeats, since the heartbeats wont kick in eventually for closing the connection at client side.

Further implication of this

There is another implication of this strategy that goes beyond to the proper bug. When a goaway package is consumed the sub-channel connection is marked as IDLE and this state is also trespassed to the getConnectivityState reporting to the caller - the user that is using the API - that the status is IDLE which from our understanding [4] it means that there are no outstanding RPCs, which is not true in case of a goaway package where there is still at least one outstanding RPC.

[1] https://github.com/grpc/grpc-go/blob/master/internal/transport/http2_client.go#L1199 [2] https://github.com/grpc/grpc-node/blob/master/packages/grpc-js/src/subchannel.ts#L525 [3] https://github.com/grpc/grpc-node/blob/master/packages/grpc-js/src/subchannel.ts#L718 [4] https://github.com/grpc/grpc/blob/master/doc/connectivity-semantics-and-api.md

Reproduction steps

We reproduced this placing an Nginx on front of a simple gRPC server, and using the keepalive_time and grpc_read_timeout directives for reproducing the folloiwng:

  • When a new the keepalive_time - lets say configured to 120s - is reached, the next RPC would receive the goaway frame
  • After the RPC that got the goaway finishes, Nginx closes proactively the TCP connection
  • We also configure the grpc_read_timeout configured to something like 60s for stopping proactively streams that did not have any traffic at all.

With Nginx configured and a gRPC stream server behind it, you can start running a gRPC stream client, and run the same stream one after the other in case of having it close it - which will happen because of the grpc_read_timeout, eventually when the keepalive_time kicks in the heart-beats will stop working, if by that time you add a rule - iptables, pf, etc - for dropping any response from the server the client, the client wont notice and the stream will hang forever.

Environment

  • OS name, version and architecture: mac
  • Node version v16.16.0
  • Node installation method nvm
  • Package name and version 1.6.8

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
pfreixescommented, Jul 28, 2022

Reproducing needs to be done by triggering the goaway code path [1], I did that by putting an Nginx in front of a gRPC stream server (we use golang but any one would serve)

Here the Nginx configuration that Ive used, where the keeplive_time does the trick for sending the goway frame

worker_processes  1;
error_log /dev/stdout debug;
master_process off;
daemon off;

events {
    worker_connections  1024;
}


http {
    include       mime.types;
    default_type  application/octet-stream;
    access_log /dev/stdout;
    sendfile        on;

    upstream backend {
        server 127.0.0.1:8081;
    }

    server {
        listen       8082 http2;
        server_name  localhost;
        keepalive_time 120s;

        location / {
            grpc_pass grpc://backend;
            grpc_socket_keepalive on;
            grpc_read_timeout 60s;
        }
    }
    include servers/*;
}

Once you have the Nginx and the gRPC backend server running its a matter of start hitting the gRPC port provided by Nginx using a gRCP client using the grpc-js package - latest version is fine - with heartbeats enabled, once started the client this will need to start stream and every time that it gets closed - because of grpc_read_timeout - just retry it.

You will notice that during the goaway event and the next RPC that will trigger a new TCP connection opened the heartbeats basically disappear.

[1] https://github.com/grpc/grpc-node/blob/master/packages/grpc-js/src/subchannel.ts#L525

0reactions
murgatroid99commented, Jul 29, 2022

The Connectivity Semantics doc is not authoritative. It is out of date and is only used as a rough guideline for implementation. All current implementations switch to IDLE when receiving a GOAWAY, whether or not there are active RPCs.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Update to gRPC logs GOAWAY with error code ... - GitLab
Endpoints SHOULD always send a GOAWAY frame before closing a connection so that the remote peer can know whether a stream has been...
Read more >
GoAway Frame With Invalid Control frame in http2
I have a proxy server that will do handshake with browser and read data from browser and then proxy server will connect to...
Read more >
h2: Better behavior when sending/receiving GOAWAY
Actually, GOAWAY in general might be ok. It's when we get a GOAWAY when we're expecting the server preamble (a SETTINGS frame) that...
Read more >
RFC 7540: Hypertext Transfer Protocol Version 2 (HTTP/2)
Clients and servers MUST treat an invalid connection preface as a connection error (Section 5.4.1) of type PROTOCOL_ERROR. A GOAWAY frame (Section 6.8)...
Read more >
681477 - HTTP/2 streams aborted by GOAWAY sent be server ...
What is the expected behavior? All the images load properly. What went wrong? Quick analysis suggests that the Cloudfront CDN sends a GOAWAY...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found