question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

L5d fails with DynBoundTimeoutException after target service is updated

See original GitHub issue

Thanks for your help improving the project!

Getting Help

Github issues are for bug reports and feature requests. For questions about Linkerd, how to use it, or debugging assistance, start by asking a question in the forums or join us on Slack.

Full details at CONTRIBUTING.md.

Filing a Linkerd issue

Issue Type:

  • Bug report
  • Feature request

What happened: When k8s deployment is applied and new version is installed, L5d fails with “E 0720 15:10:25.148 UTC THREAD10: service failure: com.twitter.finagle.naming.buoyant.DynBoundTimeoutException: Exceeded 10.seconds binding timeout while connecting to /#/io.l5d.k8s/default/http/my_service_name for name: /svc/my_service_name

What you expected to happen: L5d should update new service endpoints and send traffic to new pods

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?: k8s deployed l5d and target service servers config for usedroute:

  servers:
  - port: XXXX
    ip: 0.0.0.0
    clearContext: true
  bindingCache:
    paths: 100
    trees: 100
    bounds: 100
    clients: 10
    idleTtlSecs: 5

rollingupdate details:

  replicas: 2
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 50%
  minReadySeconds: 20

Environment:

  • linkerd/namerd version, config files: 1.4.2
  • Platform, version, and config files (Kubernetes, DC/OS, etc): kubernetes in AWS (not EKS)
  • Cloud provider or hardware configuration:

same issue was in December

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:20
  • Comments:29 (16 by maintainers)

github_iconTop GitHub Comments

1reaction
chrisgoffinetcommented, Oct 24, 2018

Update as I started looking into this issue. I can easily reproduce this now. I’ve confirmed the K8s endpoint notifications do get sent to Linkerd on destroy and create. It’s the client_state.json that’s showing the staleness. Now let me also explain why it’s kind of hard to catch this running on say your laptop. I noticed that if you just start destroying pods, they will start back up really quickly, and reuse the same IP address, so if you’re trying to hit this case it looks like no bug. The client_state.json is technically stale.

It wasn’t until I modified my deployment in K8s to inject an init container that would add a 30s sleep on pod creation, that we can see this bug surface easily. I noticed that it never seems to recover, unless you stop all traffic and let the 10m idle timeout kick in, which destroys all the state.

Now that this is easily repo now, I should be able to track down where in the code we’re missing this.

0reactions
sahilbadla27commented, Nov 8, 2018

@adleong’s fix(v1.5.1-stab) looks promising. Its deployed in dev cluster for a day now and haven’t seen any issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Systemd unit startup - Manjaro Linux Forum
I have a systemd unit [Unit] Description=Start scripts After=network-online.target Requires=network-online.target [Service] Type=simple ...
Read more >
How to write systemd service to ensure start after dns service?
Please add a dependency in your service file: After=nss-lookup.target. This should ensure that host/name lookup is operable.
Read more >
Bug #1819345 “knockd systemd service uses After=network ...
the knockd systemd service file is configured to start knockd After=network.target, however the systemd 'network.target' only means network ...
Read more >
Up and Running With Linkerd v1 - VADOSWARE
Getting Linkerd 1 working on a small app in my Kubernetes cluster.
Read more >
After=multi-user.target and others not working in a systemd ...
I am trying to make the service run as late as possible. Do After=default.target. default.target will equal whatever systemctl set-default ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found