question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DNS lookup failure causes cluster connection to hang

See original GitHub issue

In current ioredis (v4.14.1), when DNS resolution fails on connection to one host in a Redis cluster, the cluster connection closes and attempts to reconnect.

With the following test script:

const Redis = require('ioredis');
const hosts = [
  "a.valid.cluster.host",
  "host.that.does.not.exist"
];
const redis = new Redis.Cluster(hosts.map(h => ({ port: 6379, host: h })));
redis.on('connect', () => console.log('Redis connected'));
redis.on('ready', () => console.log('Redis ready'));
redis.on('error', err => console.error('Redis error', err));
redis.on('end', () => console.log('Redis closed'));

With v3.2.2 - v.4.1.0:

$ node index.js
Redis connected
Redis ready

… and the program waiting in a connected state

With ioredis v4.2.0 - v4.3.0:

$ node index.js
$

… the process exits immediately with no output or errors

With ioredis v4.3.1 - v4.14.1:

$ node index.js

… and program neither connects nor exits.

With debugging in that case, we see

$ DEBUG=ioredis.* node index.js
ioredis:cluster status: [empty] -> connecting +0ms
ioredis:cluster resolved hostname working.host.example.com to IP 127.0.0.1 +5ms
ioredis:cluster failed to resolve hostname does.not.exist to IP: getaddrinfo ENOTFOUND does.not.exist +5ms
ioredis:cluster status: connecting -> close +0ms
ioredis:cluster closed because Error: getaddrinfo ENOTFOUND does.not.exist +0ms
ioredis:cluster status: close -> reconnecting +0ms
ioredis:cluster connecting failed: Error: getaddrinfo ENOTFOUND does.not.exist +4ms
ioredis:cluster Cluster is disconnected. Retrying after 102ms +100ms
ioredis:cluster status: reconnecting -> connecting +0ms
ioredis:cluster resolved hostname working.host.example.com to IP 127.0.0.1 +1ms
ioredis:cluster failed to resolve hostname does.not.exist to IP: getaddrinfo ENOTFOUND does.not.exist +1ms
ioredis:cluster status: connecting -> close +0ms
ioredis:cluster closed because Error: getaddrinfo ENOTFOUND does.not.exist +1ms
ioredis:cluster status: close -> reconnecting +0ms
ioredis:cluster Got error Error: getaddrinfo ENOTFOUND does.not.exist when reconnecting. Ignoring... +0ms
ioredis:cluster Cluster is disconnected. Retrying after 104ms +104ms

That points to the commit https://github.com/luin/ioredis/commit/21138af

In the case of bad DNS lookup on one of many cluster hosts, I would expect the 3.2.2 behavior of connecting using the good hosts or to emit an error. The hanging behavior seems undesirable.

For my operational needs, I would rather see a DNS lookup failure behave like a down server – let the cluster connection succeed without that server.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:1
  • Comments:6 (1 by maintainers)

github_iconTop GitHub Comments

2reactions
brettkiefercommented, Dec 4, 2019

This is a real and ongoing issue, yes? Shouldn’t allow autoclose?

0reactions
stale[bot]commented, Jan 3, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 7 days if no further activity occurs, but feel free to re-open a closed issue if needed.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Solving DNS lookup failures in Kubernetes - Findmypast Tech
The problem. Findmypast have 80+ microservices running in the K8s cluster each of which have at least 3 pods running per service. Many...
Read more >
Active Directory replication Event ID 2087 (DNS lookup failure ...
This problem typically occurs when a Domain Name System (DNS) lookup failure causes replication to fail. When a destination domain controller ...
Read more >
How can I root cause DNS lookup failures in a (local ...
It appears that local-up-cluster in kubernetes, on ubuntu, isn't able to resolve DNS queries when relying on cluster DNS.
Read more >
Troubleshoot name service issues - Product documentation
Host name or IP address lookup failed or yielded incorrect results. DNS configuration ; Lookup queried an incorrect source. Name service switch ...
Read more >
Troubleshoot DNS failures with Amazon EKS
If you experience DNS query timeouts to the CoreDNS pod that you're monitoring and don't see the query in the packet capture, then...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found