question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

TTL timeout during planned failover / ElastiCache maintenance

See original GitHub issue

the fix in #385 was incomplete, the cluster might not be reinitialized if reinitialize_counter is already increased: https://github.com/Grokzen/redis-py-cluster/blob/f0aaaa4e539bc62e38ce6e23839a12ff192cb7ea/rediscluster/client.py#L661-L664 https://github.com/Grokzen/redis-py-cluster/blob/f0aaaa4e539bc62e38ce6e23839a12ff192cb7ea/rediscluster/nodemanager.py#L348-L352

During a planned ElastiCache failover, the old primary redirects requests to the new one by sending MOVED responses (thereby increasing reinitialize_counter) before terminating the old primary. https://github.com/Grokzen/redis-py-cluster/blob/f0aaaa4e539bc62e38ce6e23839a12ff192cb7ea/rediscluster/client.py#L689

An alternative approach to the counter would be to reinitialize on all MOVED responses (as discussed before), as those would always affect a larger amount of slots (at least with EC). This approach is also suggested in the redis documentation:

When a redirection is encountered, it is likely multiple slots were reconfigured rather than just one, so updating the client configuration as soon as possible is often the best strategy https://redis.io/topics/cluster-spec

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
guytubul86commented, Jul 7, 2021

Hi, any update about this issue? I also exposed to the same issue on aws maintenance window

0reactions
mschfhcommented, Jun 22, 2022

this was fixed in redis-py, after 5 retry attempts (with 250ms delay each), reinitialization is forced: https://github.com/redis/redis-py/blob/bea72995fd39b01e2f0a1682b16b6c7690933f36/redis/cluster.py#L1119-L1138

Read more comments on GitHub >

github_iconTop Results From Across the Web

TTL timeout exception during failover scenario #385 - GitHub
Hey there,. TTL timeout exception occurs during failover scenario. Looks like it tries to connect to failed master node.
Read more >
Troubleshooting - Amazon ElastiCache for Redis
On ElastiCache, the execution time of Lua scripts is limited to 5 seconds. Scripts that haven't written to the keyspace will be automatically...
Read more >
aws_elasticache_cluster | Resources | hashicorp/aws
Provides an ElastiCache Cluster resource, which manages either a Memcached cluster, a single-node Redis instance, or a [read replica in a Redis (Cluster ......
Read more >
Chapter 12. Caching data in memory: Amazon ElastiCache
This will be true until the duration of the TTL (time to live) value on the cached ... Amazon ElastiCache offers Memcached and...
Read more >
Tweaking RDS database performance and ElastiCache
Tweaking database performanceAn RDS database, or a SQL database in general, can only be scaled vertically. To scale a databas...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found