Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ClusterAllFailedError on version 4.24.1

See original GitHub issue

Hey! We’re using ioredis with our AWS ElastiCache cluster running 3 shards on version 5.0.5. We’re making the redis calls through a lambda with quite high traffic, meaning multiple concurrent lambdas running.

I’ve done some version bumping and I’m seeing the error ClusterAllFailedError: Failed to refresh slots cache intermittently. Through some debugging I’ve narrowed version 4.24.1 to be the culprit - any version before that works fine. When setting DEBUG=ioredis:* in the lambda env the ClusterAllFailedError: Failed to refresh slots cache is in most cases followed by these logs:

ioredis:cluster:connectionPool Reset with []
ioredis:cluster:connectionPool Disconnect <ip>:<port> because the node does not hold any slot
ioredis:cluster:connectionPool Remove <ip>:<port> from the pool

When looking at the 4.24.1 commit https://github.com/luin/ioredis/commit/8524eeaedaa2542f119f2b65ab8e2f15644b474e I can tell that code related to this error has been touched - could this fix have introduced unintended issues? Any pointers would be appreciated 👍

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:11 (2 by maintainers)

Top GitHub Comments

3reactions

bkvaiudecommented, Jun 22, 2021

Thanks, @vaughandroid for sharing your experience, it is really insightful and helpful.

The problem which I faced was pretty much stupid, because of the wrong tls configuration, we are facing a connection issue with AWS Redis.

AWS ElasticCache Clustered without TLS and AUTH

The problematic situation for the developer:

The IORedis doesn’t provide the right information for generated error.

When we try to replicate the issue locally with a trial and error approach, we received the same error for the following case

Passing wrong connection URL
Passing invalid credentials (username and password)
tls configuration

Now, you see it is difficult to process the error details for the above use cases and act on them. If the right error details have been provided then the developer easily gets the context of wrongdoing and s/he can take needful precautions to fix the issue.

In our case, we doubted the Redis and started looking into the performance metrics

@trademark18 I hope IORedis can consider this feedback and do the needful changes with error handling documentation

And also client.error callback handle such type of errors and not impacting anymore on the aws lambda function

Thank you very much!

1reaction

trademark18commented, Jun 14, 2021

Hi all, I’m on 4.27.6 and I’m having a similar issue where I see this error just when under a heavy load:

ERROR [ioredis] Unhandled error event: ClusterAllFailedError: Failed to refresh slots cache.
at tryNode (/var/task/node_modules/ioredis/built/cluster/index.js:396:31)
at /var/task/node_modules/ioredis/built/cluster/index.js:413:21
at Timeout.<anonymous> (/var/task/node_modules/ioredis/built/cluster/index.js:671:24)
at Timeout.run (/var/task/node_modules/ioredis/built/utils/index.js:156:22)
at listOnTimeout (internal/timers.js:557:17)
at processTimers (internal/timers.js:498:7)

And then the Lambda ends with this error message:

{
  "errorType": "Error",
  "errorMessage": "None of startup nodes is available",
  "stack": [
    "Error: None of startup nodes is available",
    " at Cluster.closeListener (/var/task/node_modules/ioredis/built/cluster/index.js:184:35)",
    " at Object.onceWrapper (events.js:482:28)",
    " at Cluster.emit (events.js:388:22)",
    " at /var/task/node_modules/ioredis/built/cluster/index.js:367:18",
    " at processTicksAndRejections (internal/process/task_queues.js:77:11)"
  ]
}

Here’s my Redis.Cluster() code:

cluster = new Redis.Cluster(
[
	{ 'host': redisSecret.host }
], 
{
	dnsLookup: (address, callback) => callback(null, address),
	redisOptions: {
		tls: true,
	},
	clusterRetryStrategy: (times) => {
		const ms = Math.min(100 * times, 2000);
		console.log(`Cluster retry #${times}: Will wait ${ms} ms`);
		return ms;
	}
}
);

Questions:

Can I avoid it entirely?
Since the Lambda errors out, it’s going to trigger alarms and such. I’d rather have it just silently attempt to reconnect to the cluster. Is there some way to catch this and then just proceed silently to the retry?

Sorry to comment on a closed issue but this is the only place I’ve found anyone talking about seeing this error only under heavy load as opposed to just incorrect network configuration or similar.

Top Results From Across the Web

Developers - ClusterAllFailedError on version 4.24.1 - - Bountysource

Hey! We're using ioredis with our AWS ElastiCache cluster running 3 shards on version 5.0.5. We're making the redis calls through a lambda...

ioredis | Yarn - Package Manager

A robust, performance-focused and full-featured Redis client for Node.js. redis, cluster, sentinel, pipelining. readme. ioredis.

ioredis: Versions - Openbase

Full version history for ioredis including change logs. ... 4.24.1. 2 years ago. 4.24.1 (2021-03-14) Bug Fixes. cluster: reconnect when failing to refresh ......

ioredis - ClusterAllFailedError: Failed to refresh slots cache

The following works for me. Redis version: 5.0.4 on AWS ElastiCache Clustered with TLS and AUTH enabled. ioredis version: 4.16.0.