question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

H2 connection starts to deadline every request on a connection after random interval

See original GitHub issue

We are mirroring some gRPC production traffic through linkerd and after a random interval (anywhere from 30 seconds to 2 hours) every request over the connection starts to deadline. After this point all of the request charts in the admin UI go to zero. If the connection is recreated through either restarting linkerd or restarting the client service, traffic flow is temporarily restored.

There are no abnormal log messages with TRACE verbosity turned on when linkerd gets in this state.

  • We did a tcpdump to verify that the traffic was reaching linkerd (it is). We observe that the client service is sending h2 request frames (DATA, HEADERS) over the stream, and then after the deadline interval sends a RST_STREAM, which is intended behavior. Nothing during this interval is sent from linkerd other than TCP ACK packets.
  • We also ran a test without linkerd to see if it works (it does).
  • We tried disabling failure accrual on the client linkerd and the server linkerd, it didn’t make a difference.

Metrics snapshot below.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:15 (13 by maintainers)

github_iconTop GitHub Comments

2reactions
klingerfcommented, May 8, 2017

Ok, I was able to track down the issue and have put together a fix in #1280. I’ve also published a docker image from that branch, to buoyantio/linkerd:h2-fix. @zackangelo, @kenkouot, if you have a chance can you verify that that image fixes the issue in your environments?

1reaction
klingerfcommented, May 4, 2017

Still don’t have a fix for this issue, but I wanted to provide another update.

In my test setup, I’m running an h2 router on port 6262, which forwards requests to a gRPC server running locally on port 8282. When looking at request patterns that trigger the error described in this issue, the bad behavior appears to be happening in the 8282 client.

The 8282 client has 2 different patterns of state transitions. The most common one is:

  • outbound HEADER frame endStream=false => stream is Open/RemotePending
  • outbound DATA frame endStream=true => stream changes to LocalClosed/RemotePending
  • inbound HEADER frame endStream=false => stream changes to LocalClosed/RemoteStreaming
  • inbound DATA frame endStream=false => stream stays LocalClosed/RemoteStreaming
  • inbound HEADER frame endStream=true => stream changes to Closed/Closed

A less common pattern (roughly 15% of requests in a random sample) is:

  • outbound HEADER frame endStream=false => stream is Open/RemotePending
  • outbound DATA frame endStream=true => stream stays Open/RemotePending
  • inbound HEADER frame endStream=false => stream changes to Open/RemoteStreaming
  • inbound DATA frame endStream=false => stream stays Open/RemoteStreaming
  • inbound HEADER frame endStream=true => stream changes to Open/RemoteClosed
  • stream changes to Closed/Closed

Both of these patterns result in successful requests, but it’s only the second pattern that ever triggers the error in this issue. When the issue is triggered, the final inbound HEADER is received by the 8282 client, but it is never sent as an outbound HEADER on the 6262 server. Messages are passed from the client to the server using util’s AsyncQueue. In the error situation, the queue refuses the final .offer of the last HEADER frame, but it is not clear to me why the offer is refused. This seems like a race condition, if the queue is reset before the offer is made, but I can’t determine where the reset is coming from. Will keep investigating.

Read more comments on GitHub >

github_iconTop Results From Across the Web

springboot 2.3.0 while connecting to h2 database
The solution is in the console log itself. The database name is auto-generated by and can be found in the spring logs. The...
Read more >
h2.pdf
automatically starts a server on a random port. This server allows remote connections, however only to this database (to ensure that, ...
Read more >
All configuration options - Quarkus
AWS Lambda Type Default AWS Lambda Common Type Default AWS Lambda Gateway REST API Type Default Agroal ‑ Database connection pool Type Default
Read more >
Configuration - Gerrit Code Review
Minimum number of connections to keep idle in the pool. Default is 4. ... The size of the in-memory cache for each opened...
Read more >
HAProxy version 2.2.22 - Configuration Manual - GitHub Pages
When it receives HTTP/2 connections from a client, it processes all the requests in parallel and leaves the connection idling, waiting for new...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found