DNS Issue on Kubernetes (ndots=5 + search domain query)
See original GitHub issueExpected behavior
DNS Resolution works always
Actual behavior
Some hosts are not resolving and keep getting DnsNameResolverTimeoutException.
SearchDomainUnknownHostException: Search domain query failed. Original hostname: 's3-eu-central-1.amazonaws.com' failed to resolve 's3-eu-central-1.amazonaws.com.default.svc.cluster.local' after 2 queries
at io.netty.resolver.dns.DnsResolveContext.finishResolve(DnsResolveContext.java:845)
at io.netty.resolver.dns.DnsResolveContext.tryToFinishResolve(DnsResolveContext.java:806)
at io.netty.resolver.dns.DnsResolveContext.query(DnsResolveContext.java:333)
at io.netty.resolver.dns.DnsResolveContext.query(DnsResolveContext.java:322)
at io.netty.resolver.dns.DnsResolveContext.access$500(DnsResolveContext.java:62)
...
(17 additional frame(s) were not displayed)
DnsNameResolverTimeoutException: [/100.64.0.10:53] query timed out after 5000 milliseconds (no stack trace available)
The issue is not only relevant for cluster external dns entries, but also when querying <service-name>
, which should resolve when using search path. using the FQDN <service-name>.default.svc.cluster.local
makes the issue less noticable, except for external names due to the ndots.
The DNS server (coredns pod) is located on the same kubernetes node. performing dns queries on the shell using dig +search [...]
or nslookup
always yield the correct result
Steps to reproduce
inside a kubernetes pod, try resolving internal names and external names. doesn’t seem to be a network issue, I’ve seen it on weave (with the tc fix applied) and flannel-vxlan and all other non-netty pods don’t have any issues with DNS
this is the /etc/resolv.conf
nameserver 100.64.0.10
search default.svc.cluster.local svc.cluster.local cluster.local eu-central-1.compute.internal
options ndots:5
this rules out the fix applied for #8261 because it’s a single dns server
Minimal yet complete reproducer code (or URL to code)
working on that part right now. Really hard to make it reproducable. I’ve tried ruling out everything else, see above
Netty version
4.1.30 (through Vert.x 3.6.3)
JVM version (e.g. java -version
)
8
OS version (e.g. uname -a
)
CentOS 7.6 (Docker)
Issue Analytics
- State:
- Created 5 years ago
- Comments:10 (4 by maintainers)
Top GitHub Comments
disabling ipv6 seems to fix this issue
https://pracucci.com/kubernetes-dns-resolution-ndots-options-and-why-it-may-affect-application-performances.html