question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Redisson cluster cannot recover from "missing slot"

See original GitHub issue

A test program is pushing a bunch of requests (evalSha for lock-free evaluation) repeatedly. Then bounced Redis Cluster in the middle of it. The connections were lost, then discovered within seconds, as expected. However, Redisson is stuck unable to recover from this condition:

org.redisson.client.RedisNodeNotFoundException: Node for slot: 13907 hasn't been discovered yet. Check cluster slots coverage using CLUSTER NODES command. Increase value of retryAttempts and/or retryInterval settings.
	at org.redisson.connection.MasterSlaveConnectionManager.createNodeNotFoundException(MasterSlaveConnectionManager.java:554)
	at org.redisson.connection.MasterSlaveConnectionManager.connectionWriteOp(MasterSlaveConnectionManager.java:508)
	at org.redisson.command.RedisExecutor.getConnection(RedisExecutor.java:534)
	at org.redisson.command.RedisExecutor.execute(RedisExecutor.java:121)
	at org.redisson.command.RedisExecutor$2.run(RedisExecutor.java:251)
	at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672)
	at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747)
	at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	... 1 more

Redis cluster state is fine:

127.0.0.1:6379> cluster nodes
fe0223c0b542837bd94699d1af3c19e4ac31178f 172.28.1.2:6379@16379 master - 0 1619718045643 2 connected 5461-10922
5d8b390298d2962cf69385d3e15b71dc432d5280 172.28.1.5:6379@16379 slave af2498fa4bb5ad08751bfb52a954a36964510e59 0 1619718045000 1 connected
af2498fa4bb5ad08751bfb52a954a36964510e59 172.28.1.1:6379@16379 myself,master - 0 1619718043000 1 connected 0-5460
9e3493660fd571eb3454de79fd85776bfe3f81e5 172.28.1.3:6379@16379 master - 0 1619718044000 3 connected 10923-16383
9e23832ccbefdd2d20a410848fecc092efd78b1e 172.28.1.4:6379@16379 slave 9e3493660fd571eb3454de79fd85776bfe3f81e5 0 1619718045000 3 connected
76585e6d75f61b98178df0c6aafcba6a0de0d8b3 172.28.1.6:6379@16379 slave fe0223c0b542837bd94699d1af3c19e4ac31178f 0 1619718044541 2 connected

Expected behavior

Eventually discover the hash slot is available.

Actual behavior

Retrying the same sequence of operations does not recover even after 5 minutes.

Steps to reproduce or test case

  1. Start Redis cluster
  2. Loop evalSha across all cluster nodes.
  3. Bounce Redis cluster in the middle of the loop
  4. Observe the failures

Observe that restarting the Redisson (bounce microservice in this case) recovers - so the issue is not in Redis cluster.

Redis version

127.0.0.1:6379> info
# Server
redis_version:6.2.1
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:fa5dfffb0053744e
redis_mode:cluster
os:Linux 4.14.35-1902.3.1.el7uek.x86_64 x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:c11-builtin
gcc_version:8.3.0
process_id:1
process_supervised:no
run_id:84f6b708cd93f23cb33317537d55220e8bd94b61
tcp_port:6379
server_time_usec:1619718169208450
uptime_in_seconds:1922
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:9105433
executable:/data/redis-server
config_file:/usr/local/etc/redis/redis.conf
io_threads_active:0

Redisson version

3.15.4

Redisson configuration

@ApplicationScoped
public class RedissonClientProvider {
   public RedissonClientProvider() {
   }

   @Produces
   @ApplicationScoped
   public RedissonClient cluster(@ConfigProperty(name = "jedis.cluster.addresses") String[] addresses) {
      Config config = new Config();
      config.useClusterServers()
            .addNodeAddress(Arrays.stream(addresses).map(a -> "redis://" + a).toArray(n -> new String[n]))
            .setRetryAttempts(3);
      return Redisson.create(config);
   }
}

Config: just a bunch of hostname:port, no password.

Cluster is the minimal set of nodes: 6 redis containers, dataset is tiny (100K 10Byte strings).

/usr/local/etc/redis/redis.conf:

port 6379
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
appendonly yes

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
mrnikocommented, May 3, 2021

Can you try it with version attached?

redisson-3.15.5-SNAPSHOT.jar.zip

0reactions
mrnikocommented, May 4, 2021

Thanks for testing! Fixed

Read more comments on GitHub >

github_iconTop Results From Across the Web

Redisson cluster cannot recover from "missing slot"
Start Redis cluster · Loop evalSha across all cluster nodes. · Bounce Redis cluster in the middle of the loop · Observe the...
Read more >
Redis Cluster - fails to calculate key slot when missing closing
I have { as part of the key but no } because it is part of my domain which is not intended to...
Read more >
How to fix the redis cluster state, after a master and all its ...
Another approach could be to set the property "cluster-require-full-coverage" to "no" on all the servers without stopping them. The cluster will ...
Read more >
mrniko/redisson - Gitter
The situation is the redis process dies on a Master node I am writing to, a slave is then promoted to master. The...
Read more >
Why am I getting the "CROSSSLOT Keys in request don't hash ...
This error occurs because keys must be in the same hash slot and not just the same node. To implement multi-key operations in...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found