question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

linkerd stops routing HTTP traffic to external DNS names

See original GitHub issue

Thanks for your help improving the project!

Getting Help

Github issues are for bug reports and feature requests. For questions about Linkerd, how to use it, or debugging assistance, start by asking a question in the forums or join us on Slack.

Full details at CONTRIBUTING.md.

Filing a Linkerd issue

Issue Type:

  • Bug report
  • Feature request

What happened: After some time linkerd stops routing traffic to external DNS names with

“E 0812 09:45:11.218 UTC THREAD11 TraceId:b86af61cbd38ae05: service failure: com.twitter.finagle.naming.buoyant.DynBoundTimeoutException: Exceeded 30.seconds binding timeout while resolving name: /svc/google.com”

and

“I 0812 09:45:34.190 UTC THREAD11: Reaping /svc/google.com”

Routing to internal services works fine.

Additional symptom:

Delegator webpage starts to load without DTAB form and with message

“The request to namerd has timed out. Please ensure your config is correct and try again.”

Restart of linkerd restores functioning of routing to external services and Delegator webpage

What you expected to happen: linkerd shouldn’t require restart to route traffic to external services

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?: linkerd and affected services are deployed in k8s cluster router configuration:

    - protocol: http
      label: http-outgoing
      maxRequestKB: 20480
      maxResponseKB: 20480
      httpAccessLog: /var/log/linkerd/l5d-http-outgoing-access.log
      client:
        failureAccrual:
          kind: none
      interpreter:
        kind: io.l5d.k8s.configMap
        experimental: true
        name: l5d-dtabs-config
        filename: http-outgoing
        namespace: servicemesh
      servers:
      - port: 4140
        ip: 0.0.0.0
      bindingTimeoutMs: 30000
      bindingCache:
        paths: 100
        trees: 100
        bounds: 100
        clients: 10
        idleTtlSecs: 5

DTAB used:

  http-outgoing: |-
        /ph        => /$/io.buoyant.rinet ;                     # /ph/80/google.com -> /$/io.buoyant.rinet/80/google.com
        /svc       => /ph/80 ;                                  # /svc/google.com -> /ph/80/google.com
        /svc       => /$/io.buoyant.porthostPfx/ph ;            # /svc/google.com:80 -> /ph/80/google.com
        /k8s       => /#/io.l5d.k8s ;                           # /k8s/default/http/foo -> /#/io.l5d.k8s.http/default/http/foo
        /portNsSvc => /#/portNsSvcToK8s ;                       # /portNsSvc/http/default/foo -> /k8s/default/http/foo
        /host      => /portNsSvc/http/default ;                 # /host/foo -> /portNsSvc/http/default/foo
        /host      => /portNsSvc/http ;                         # /host/default/foo -> /portNsSvc/http/default/foo
        /svc       => /$/io.buoyant.http.domainToPathPfx/host ; # /svc/foo.default -> /host/default/foo	

Environment:

  • linkerd/namerd version, config files: linkerd 1.6.4; namerd is not used
  • Platform, version, and config files (Kubernetes, DC/OS, etc): Kops created k8s cluster in AWS (1.11.9, 1.12.7)
  • Cloud provider or hardware configuration: AWS

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:2
  • Comments:20 (14 by maintainers)

github_iconTop GitHub Comments

1reaction
zaharidichevcommented, Sep 12, 2019

@valerii-grachov Thanks a lot for the detailed explanation of the issue and the provided evidence. We are still trying to deterministically reproduce the problem. By the looks of it a cache that should not evict under any circumstances is evicting one of these watchers and closing it premmaturely. When we manage to reproduce the problem and catch why this is happening, we will push out a fix.

0reactions
cpretzercommented, Jan 7, 2020

@j0sh3rs I’m going to close this ticket for now. Please reopen it if you get more details about the behavior and how to reproduce it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshooting | Linkerd
Traffic to pods in this network may not be meshed properly. To remedy this, update the clusterNetworks setting to include all pod networks...
Read more >
linkerd routing to endpoints external to k8s cluster
I came up with a configuration that should work for you -- it's in the attached linkerd.yml file. Basically, you'd still need an...
Read more >
Preventing SMB traffic from lateral connections and entering ...
For a list of Windows and Windows Server applications and services that may stop functioning in this situation, see Service overview and network...
Read more >
Use swarm mode routing mesh - Docker Documentation
Port 4789 UDP for the container ingress network. You must also open the published port between the swarm nodes and any external resources,...
Read more >
Making Amazon Route 53 the DNS service for an existing ...
If the domain is currently getting traffic—for example, if your users are using the domain name to browse to a website or access...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found