linkerd stops routing HTTP traffic to external DNS names
See original GitHub issueThanks for your help improving the project!
Getting Help
Github issues are for bug reports and feature requests. For questions about Linkerd, how to use it, or debugging assistance, start by asking a question in the forums or join us on Slack.
Full details at CONTRIBUTING.md.
Filing a Linkerd issue
Issue Type:
- Bug report
- Feature request
What happened: After some time linkerd stops routing traffic to external DNS names with
“E 0812 09:45:11.218 UTC THREAD11 TraceId:b86af61cbd38ae05: service failure: com.twitter.finagle.naming.buoyant.DynBoundTimeoutException: Exceeded 30.seconds binding timeout while resolving name: /svc/google.com”
and
“I 0812 09:45:34.190 UTC THREAD11: Reaping /svc/google.com”
Routing to internal services works fine.
Additional symptom:
Delegator webpage starts to load without DTAB form and with message
“The request to namerd has timed out. Please ensure your config is correct and try again.”
Restart of linkerd restores functioning of routing to external services and Delegator webpage
What you expected to happen: linkerd shouldn’t require restart to route traffic to external services
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?: linkerd and affected services are deployed in k8s cluster router configuration:
- protocol: http
label: http-outgoing
maxRequestKB: 20480
maxResponseKB: 20480
httpAccessLog: /var/log/linkerd/l5d-http-outgoing-access.log
client:
failureAccrual:
kind: none
interpreter:
kind: io.l5d.k8s.configMap
experimental: true
name: l5d-dtabs-config
filename: http-outgoing
namespace: servicemesh
servers:
- port: 4140
ip: 0.0.0.0
bindingTimeoutMs: 30000
bindingCache:
paths: 100
trees: 100
bounds: 100
clients: 10
idleTtlSecs: 5
DTAB used:
http-outgoing: |-
/ph => /$/io.buoyant.rinet ; # /ph/80/google.com -> /$/io.buoyant.rinet/80/google.com
/svc => /ph/80 ; # /svc/google.com -> /ph/80/google.com
/svc => /$/io.buoyant.porthostPfx/ph ; # /svc/google.com:80 -> /ph/80/google.com
/k8s => /#/io.l5d.k8s ; # /k8s/default/http/foo -> /#/io.l5d.k8s.http/default/http/foo
/portNsSvc => /#/portNsSvcToK8s ; # /portNsSvc/http/default/foo -> /k8s/default/http/foo
/host => /portNsSvc/http/default ; # /host/foo -> /portNsSvc/http/default/foo
/host => /portNsSvc/http ; # /host/default/foo -> /portNsSvc/http/default/foo
/svc => /$/io.buoyant.http.domainToPathPfx/host ; # /svc/foo.default -> /host/default/foo
Environment:
- linkerd/namerd version, config files: linkerd 1.6.4; namerd is not used
- Platform, version, and config files (Kubernetes, DC/OS, etc): Kops created k8s cluster in AWS (1.11.9, 1.12.7)
- Cloud provider or hardware configuration: AWS
Issue Analytics
- State:
- Created 4 years ago
- Reactions:2
- Comments:20 (14 by maintainers)
Top GitHub Comments
@valerii-grachov Thanks a lot for the detailed explanation of the issue and the provided evidence. We are still trying to deterministically reproduce the problem. By the looks of it a cache that should not evict under any circumstances is evicting one of these watchers and closing it premmaturely. When we manage to reproduce the problem and catch why this is happening, we will push out a fix.
@j0sh3rs I’m going to close this ticket for now. Please reopen it if you get more details about the behavior and how to reproduce it.