H2 connection starts to deadline every request on a connection after random interval
See original GitHub issueWe are mirroring some gRPC production traffic through linkerd and after a random interval (anywhere from 30 seconds to 2 hours) every request over the connection starts to deadline. After this point all of the request charts in the admin UI go to zero. If the connection is recreated through either restarting linkerd or restarting the client service, traffic flow is temporarily restored.
There are no abnormal log messages with TRACE
verbosity turned on when linkerd gets in this state.
- We did a tcpdump to verify that the traffic was reaching linkerd (it is). We observe that the client service is sending h2 request frames (
DATA
,HEADERS
) over the stream, and then after the deadline interval sends aRST_STREAM
, which is intended behavior. Nothing during this interval is sent from linkerd other than TCP ACK packets. - We also ran a test without linkerd to see if it works (it does).
- We tried disabling failure accrual on the client linkerd and the server linkerd, it didn’t make a difference.
Metrics snapshot below.
Issue Analytics
- State:
- Created 6 years ago
- Comments:15 (13 by maintainers)
Top Results From Across the Web
springboot 2.3.0 while connecting to h2 database
The solution is in the console log itself. The database name is auto-generated by and can be found in the spring logs. The...
Read more >h2.pdf
automatically starts a server on a random port. This server allows remote connections, however only to this database (to ensure that, ...
Read more >All configuration options - Quarkus
AWS Lambda Type Default
AWS Lambda Common Type Default
AWS Lambda Gateway REST API Type Default
Agroal ‑ Database connection pool Type Default
Read more >Configuration - Gerrit Code Review
Minimum number of connections to keep idle in the pool. Default is 4. ... The size of the in-memory cache for each opened...
Read more >HAProxy version 2.2.22 - Configuration Manual - GitHub Pages
When it receives HTTP/2 connections from a client, it processes all the requests in parallel and leaves the connection idling, waiting for new...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Ok, I was able to track down the issue and have put together a fix in #1280. I’ve also published a docker image from that branch, to
buoyantio/linkerd:h2-fix
. @zackangelo, @kenkouot, if you have a chance can you verify that that image fixes the issue in your environments?Still don’t have a fix for this issue, but I wanted to provide another update.
In my test setup, I’m running an h2 router on port 6262, which forwards requests to a gRPC server running locally on port 8282. When looking at request patterns that trigger the error described in this issue, the bad behavior appears to be happening in the 8282 client.
The 8282 client has 2 different patterns of state transitions. The most common one is:
A less common pattern (roughly 15% of requests in a random sample) is:
Both of these patterns result in successful requests, but it’s only the second pattern that ever triggers the error in this issue. When the issue is triggered, the final inbound HEADER is received by the 8282 client, but it is never sent as an outbound HEADER on the 6262 server. Messages are passed from the client to the server using util’s AsyncQueue. In the error situation, the queue refuses the final
.offer
of the last HEADER frame, but it is not clear to me why the offer is refused. This seems like a race condition, if the queue is reset before the offer is made, but I can’t determine where the reset is coming from. Will keep investigating.