question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Long-lived h2 gRPC connections stop forwarding requests

See original GitHub issue

We’re seeing an issue with linkerd 1.1.0 where after 12-18 hours, deadlines start to occur regularly when sending traffic over an h2 router. It also seems that the longer the linkerd instance is left running, the more deadlines occur.

There don’t appear to be any relevant linkerd log messages. The client observes this behavior as a timeout, and sends an h2 reset frame after its deadline expires:

Jun 15 14:07:11 nomad-client2-p.dal10sl.bigcommerce.net linkerd[3398]: W 0615 19:07:11.311 UTC THREAD55 TraceId:e3c3649cc6eaf30c: Exception propagated to the default monitor (upstream address: /172.17.0.4:42448, downstream address: /10.143.147.85:4143, label: %/io.l5d.port/4143/#/io.l5d.consul/.local/storeconfig).
Jun 15 14:07:11 nomad-client2-p.dal10sl.bigcommerce.net linkerd[3398]: Reset.Cancel

These are the relevant failure metrics for the client in question (h2-out):

"rt/h2-out/client/%/io.l5d.port/4143/#/io.l5d.consul/.local/storeconfig/failures": 79858,
  "rt/h2-out/client/%/io.l5d.port/4143/#/io.l5d.consul/.local/storeconfig/failures/com.twitter.finagle.buoyant.h2.Reset$Cancel$": 79813,
  "rt/h2-out/client/%/io.l5d.port/4143/#/io.l5d.consul/.local/storeconfig/failures/com.twitter.finagle.buoyant.h2.Reset$Refused$": 6,
  "rt/h2-out/client/%/io.l5d.port/4143/#/io.l5d.consul/.local/storeconfig/failures/com.twitter.finagle.buoyant.h2.Reset$InternalError$": 39,

I’ll attach a full metrics dump below.

Visually, this is what the client looks like in the linkerd admin console when the client gets in this state: image

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:20 (20 by maintainers)

github_iconTop GitHub Comments

2reactions
adleongcommented, Jul 13, 2017

🎉 🎉 🎉

1reaction
adleongcommented, Jul 3, 2017
Read more comments on GitHub >

github_iconTop Results From Across the Web

gRPC Long-lived Streaming - Code The Cloud
Implementing gRPC long-lived streaming - a tool for cloud native applications. ... A typical RPC is an immediate request-response mechanism.
Read more >
HAProxy 1.9.2 Adds gRPC Support
This allows data to be piped back and forth over a long-lived connection, breaking free of the limitations of the request/response-per-message ...
Read more >
New – Application Load Balancer Support for End-to-End ...
In this way, you can use ALBs to terminate, route and load balance the gRPC traffic between your microservices or between gRPC-enabled clients ......
Read more >
Long-lived channels and asynchronous calls - Google Groups
A simple NOOP service that you issue a request to every 15 minutes-2 hours would be enough. You could ignore the response of...
Read more >
HAProxy version 2.4.15 - Configuration Manual - GitHub Pages
If a server supports long lines, it may make sense to set this value ... good enough distribution and connections are extremely short-lived....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found