TTL timeout during planned failover / ElastiCache maintenance
See original GitHub issuethe fix in #385 was incomplete, the cluster might not be reinitialized if reinitialize_counter is already increased:
https://github.com/Grokzen/redis-py-cluster/blob/f0aaaa4e539bc62e38ce6e23839a12ff192cb7ea/rediscluster/client.py#L661-L664
https://github.com/Grokzen/redis-py-cluster/blob/f0aaaa4e539bc62e38ce6e23839a12ff192cb7ea/rediscluster/nodemanager.py#L348-L352
During a planned ElastiCache failover, the old primary redirects requests to the new one by sending MOVED responses (thereby increasing reinitialize_counter) before terminating the old primary.
https://github.com/Grokzen/redis-py-cluster/blob/f0aaaa4e539bc62e38ce6e23839a12ff192cb7ea/rediscluster/client.py#L689
An alternative approach to the counter would be to reinitialize on all MOVED responses (as discussed before), as those would always affect a larger amount of slots (at least with EC). This approach is also suggested in the redis documentation:
When a redirection is encountered, it is likely multiple slots were reconfigured rather than just one, so updating the client configuration as soon as possible is often the best strategy https://redis.io/topics/cluster-spec
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (2 by maintainers)

Top Related StackOverflow Question
Hi, any update about this issue? I also exposed to the same issue on aws maintenance window
this was fixed in redis-py, after 5 retry attempts (with 250ms delay each), reinitialization is forced: https://github.com/redis/redis-py/blob/bea72995fd39b01e2f0a1682b16b6c7690933f36/redis/cluster.py#L1119-L1138