Redisson cluster cannot recover from "missing slot"
See original GitHub issueA test program is pushing a bunch of requests (evalSha
for lock-free evaluation) repeatedly. Then bounced Redis Cluster in the middle of it. The connections were lost, then discovered within seconds, as expected. However, Redisson is stuck unable to recover from this condition:
org.redisson.client.RedisNodeNotFoundException: Node for slot: 13907 hasn't been discovered yet. Check cluster slots coverage using CLUSTER NODES command. Increase value of retryAttempts and/or retryInterval settings.
at org.redisson.connection.MasterSlaveConnectionManager.createNodeNotFoundException(MasterSlaveConnectionManager.java:554)
at org.redisson.connection.MasterSlaveConnectionManager.connectionWriteOp(MasterSlaveConnectionManager.java:508)
at org.redisson.command.RedisExecutor.getConnection(RedisExecutor.java:534)
at org.redisson.command.RedisExecutor.execute(RedisExecutor.java:121)
at org.redisson.command.RedisExecutor$2.run(RedisExecutor.java:251)
at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672)
at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747)
at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
... 1 more
Redis cluster state is fine:
127.0.0.1:6379> cluster nodes
fe0223c0b542837bd94699d1af3c19e4ac31178f 172.28.1.2:6379@16379 master - 0 1619718045643 2 connected 5461-10922
5d8b390298d2962cf69385d3e15b71dc432d5280 172.28.1.5:6379@16379 slave af2498fa4bb5ad08751bfb52a954a36964510e59 0 1619718045000 1 connected
af2498fa4bb5ad08751bfb52a954a36964510e59 172.28.1.1:6379@16379 myself,master - 0 1619718043000 1 connected 0-5460
9e3493660fd571eb3454de79fd85776bfe3f81e5 172.28.1.3:6379@16379 master - 0 1619718044000 3 connected 10923-16383
9e23832ccbefdd2d20a410848fecc092efd78b1e 172.28.1.4:6379@16379 slave 9e3493660fd571eb3454de79fd85776bfe3f81e5 0 1619718045000 3 connected
76585e6d75f61b98178df0c6aafcba6a0de0d8b3 172.28.1.6:6379@16379 slave fe0223c0b542837bd94699d1af3c19e4ac31178f 0 1619718044541 2 connected
Expected behavior
Eventually discover the hash slot is available.
Actual behavior
Retrying the same sequence of operations does not recover even after 5 minutes.
Steps to reproduce or test case
- Start Redis cluster
- Loop evalSha across all cluster nodes.
- Bounce Redis cluster in the middle of the loop
- Observe the failures
Observe that restarting the Redisson (bounce microservice in this case) recovers - so the issue is not in Redis cluster.
Redis version
127.0.0.1:6379> info
# Server
redis_version:6.2.1
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:fa5dfffb0053744e
redis_mode:cluster
os:Linux 4.14.35-1902.3.1.el7uek.x86_64 x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:c11-builtin
gcc_version:8.3.0
process_id:1
process_supervised:no
run_id:84f6b708cd93f23cb33317537d55220e8bd94b61
tcp_port:6379
server_time_usec:1619718169208450
uptime_in_seconds:1922
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:9105433
executable:/data/redis-server
config_file:/usr/local/etc/redis/redis.conf
io_threads_active:0
Redisson version
3.15.4
Redisson configuration
@ApplicationScoped
public class RedissonClientProvider {
public RedissonClientProvider() {
}
@Produces
@ApplicationScoped
public RedissonClient cluster(@ConfigProperty(name = "jedis.cluster.addresses") String[] addresses) {
Config config = new Config();
config.useClusterServers()
.addNodeAddress(Arrays.stream(addresses).map(a -> "redis://" + a).toArray(n -> new String[n]))
.setRetryAttempts(3);
return Redisson.create(config);
}
}
Config: just a bunch of hostname:port
, no password.
Cluster is the minimal set of nodes: 6 redis containers, dataset is tiny (100K 10Byte strings).
/usr/local/etc/redis/redis.conf
:
port 6379
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
appendonly yes
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (3 by maintainers)
Can you try it with version attached?
redisson-3.15.5-SNAPSHOT.jar.zip
Thanks for testing! Fixed