question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unable to unfreeze entry when redis goes down in kuberentes (bitnami/redis)

See original GitHub issue

Expected behavior If the redis sentinel master goes down for a short amount of time, and comes back up in less than down-after-milliseconds (i.e. a new master is not picked), redisson should continue to use the master

Actual behavior The pod continuously streams the following output:

2022-10-06 16:14:29.868 ERROR 1 --- [isson-netty-2-3] o.r.c.balancer.LoadBalancerManager       : Unable to unfreeze entry: [freeSubscribeConnectionsAmount=0, freeSubscribeConnectionsCounter=value:87:queue:0, freeConnectionsAmount=0, freeConnectionsCounter=value:138:queue:0, freezeReason=SYSTEM, client=[addr=redis://10.0.3.173:6379], nodeType=SLAVE, firstFail=1665072869858]

java.util.concurrent.CompletionException: org.redisson.client.RedisConnectionException: Unable to connect to Redis server: 10.0.3.173/10.0.3.173:6379
	at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
	at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346)
	at java.base/java.util.concurrent.CompletableFuture$BiRelay.tryFire(CompletableFuture.java:1423)
	at java.base/java.util.concurrent.CompletableFuture$CoCompletion.tryFire(CompletableFuture.java:1144)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
	at org.redisson.connection.pool.ConnectionPool.lambda$createConnection$1(ConnectionPool.java:154)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
	at org.redisson.connection.pool.ConnectionPool.promiseFailure(ConnectionPool.java:317)
	at org.redisson.connection.pool.ConnectionPool.lambda$createConnection$7(ConnectionPool.java:276)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
	at org.redisson.client.RedisClient$2$2.run(RedisClient.java:307)
	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:995)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.redisson.client.RedisConnectionException: Unable to connect to Redis server: 10.0.3.173/10.0.3.173:6379
	at org.redisson.connection.pool.ConnectionPool.lambda$createConnection$1(ConnectionPool.java:153)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
	at org.redisson.connection.pool.ConnectionPool.promiseFailure(ConnectionPool.java:317)
	at org.redisson.connection.pool.ConnectionPool.lambda$createConnection$7(ConnectionPool.java:276)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
	at org.redisson.client.RedisClient$1$2.run(RedisClient.java:250)
	... 8 common frames omitted
Caused by: java.util.concurrent.CompletionException: io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: Host is unreachable: 10.0.3.173/10.0.3.173:6379
	at java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367)
	at java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376)
	at java.base/java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:1019)
	... 11 common frames omitted
Caused by: io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: Host is unreachable: 10.0.3.173/10.0.3.173:6379
Caused by: java.net.NoRouteToHostException: Host is unreachable
	at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777)
	at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337)
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:710)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:995)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:829)

And:

Caused by: org.redisson.client.RedisConnectionException: Unable to connect to Redis server: 10.0.3.173/10.0.3.173:6379
	at org.redisson.connection.pool.ConnectionPool.lambda$createConnection$1(ConnectionPool.java:153)
	... 19 common frames omitted
Caused by: java.util.concurrent.CompletionException: io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: Host is unreachable: 10.0.3.173/10.0.3.173:6379
	at java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367)
	at java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376)
	at java.base/java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:1019)
	... 11 common frames omitted
Caused by: io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: Host is unreachable: 10.0.3.173/10.0.3.173:6379
Caused by: java.net.NoRouteToHostException: Host is unreachable
	at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777)
	at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337)
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:710)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:995)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:829)

Steps to reproduce or test case

  1. Install redis in sentinel configuration using the latest bitnami helm chart
helm install "${APP_NAME}-redis" bitnami/redis \
             --version "17.3.1" \
             --set sentinel.enabled=true \
             --set auth.password="${REDIS_PASSWORD}" \
             --set replica.replicaCount=3 \
             --set global.redis.password="${REDIS_PASSWORD}"
  1. Now, you have 3 redis nods running:
kubectl get pods | grep redis
test2021-redis-node-0                   2/2     Running     0             14m
test2021-redis-node-1                   2/2     Running     0             14m
test2021-redis-node-2                   2/2     Running     0             13m
  1. Some information about the sentinel master:
kubectl exec -it test2021-redis-node-1 bash -c sentinel
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
I have no name!@test2021-redis-node-1:/$ redis-cli -a $REDIS_PASSWORD -p 26379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:26379> sentinel master mymaster
 1) "name"
 2) "mymaster"
 3) "ip"
 4) "test2021-redis-node-0.test2021-redis-headless.default.svc.cluster.local"
 ...
 21) "down-after-milliseconds"
22) "60000"
....
32) "failover-timeout"
33) 180000
....
  1. As you can see, node 0 is the master, so let’s kill it:
kubectl delete pods/test2021-redis-node-0 --force --grace-period=0
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "test2021-redis-node-0" force deleted
  1. Node 0 stays as the master, since it was down for ~10 seconds, which is less than the value of down-after-milliseconds, which is set to 60s. So the output of step 3 is still the same.

  2. The application server continuously streams the exceptions which I sent, and keeps streaming them long after (I monitored the logs for ~5 minutes) the master came back up. The pod is not useable until it restarts (via failed healthcheck or manually, at that point it doesn’t matter).

Redis version 7.0.5

Redisson version 3.17.7

Redisson configuration

---
sentinelServersConfig:
  idleConnectionTimeout: 10000
  connectTimeout: 10000
  timeout: 5000
  retryAttempts: 3
  retryInterval: 1500
  failedSlaveReconnectionInterval: 3000
  failedSlaveCheckInterval: 60000
  subscriptionsPerConnection: 5
  clientName: null
  loadBalancer: !<org.redisson.connection.balancer.RoundRobinLoadBalancer> {}
  subscriptionConnectionMinimumIdleSize: 1
  subscriptionConnectionPoolSize: 50
  slaveConnectionMinimumIdleSize: 24
  slaveConnectionPoolSize: 64
  masterConnectionMinimumIdleSize: 24
  masterConnectionPoolSize: 64
  readMode: "SLAVE"
  subscriptionMode: "SLAVE"
  password: "whatever"
  sentinelAddresses:
  - "redis://test2021-redis-headless:26379"
  masterName: "mymaster"
  keepAlive: true
  database: 0
  checkSentinelsList: false
eventLoopGroup: null
threads: 16
nettyThreads: 32
codec: !<org.redisson.codec.SnappyCodec> {}
transportMode: "NIO"

#https://github.com/redisson/redisson/issues/4356
addressResolverGroupFactory: !<org.company.redis.CustomDnsAddressResolverGroupFactory> {}

And the addressResolver:

/**
 * Required to fix https://github.com/redisson/redisson/issues/4356
 * @author lbornov2
 *
 */
public class CustomDnsAddressResolverGroupFactory implements AddressResolverGroupFactory {
    
    @Override
    public DnsAddressResolverGroup create(Class<? extends DatagramChannel> channelType,
            DnsServerAddressStreamProvider nameServerProvider) {
        final DnsAddressResolverGroup group = new DnsAddressResolverGroup(new DnsNameResolverBuilder()
                .channelType(NioDatagramChannel.class)
                .authoritativeDnsServerCache(NoopAuthoritativeDnsServerCache.INSTANCE)
                .nameServerProvider(DnsServerAddressStreamProviders.platformDefault()));
        return group;
    }

}

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:34 (16 by maintainers)

github_iconTop GitHub Comments

2reactions
mrnikocommented, Oct 11, 2022

Thanks for the logs. Please try a new build below.

redisson-3.17.8-SNAPSHOT.jar.zip

0reactions
olessiocommented, Dec 1, 2022

We use 3.18.0 release, initial problem has been fixed: Redisson manages to reconnect. But, recently we saw that one of our services which uses Redisson client was polluting the log after the reconnect with next exceptions:

ERROR 1 [] [redisson-timer-4-1] org.redisson.connection.balancer.LoadBalancerManager : Unable to unfreeze entry: [freeSubscribeConnectionsAmount=0, freeSubscribeConnectionsCounter=value:85:queue:0, freeConnectionsAmount=0, freeConnectionsCounter=value:134:queue:0, freezeReason=SYSTEM, client=[addr=redis://10.83.19.20:6379], nodeType=SLAVE, firstFail=1669808100722]

 java.util.concurrent.CompletionException: org.redisson.client.RedisConnectionException: Unable to init enough connections amount! Only 0 of 4 were initialized. Server: 10.83.19.20/10.83.19.20:6379
	at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
	at java.base/java.util.concurrent.CompletableFuture.andTree(CompletableFuture.java:1452)
	at java.base/java.util.concurrent.CompletableFuture.allOf(CompletableFuture.java:2337)
	at org.redisson.connection.balancer.LoadBalancerManager.unfreeze(LoadBalancerManager.java:160)
	at org.redisson.connection.balancer.LoadBalancerManager.lambda$unfreeze$2(LoadBalancerManager.java:166)
	at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:669)
	at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:744)
	at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:469)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.redisson.client.RedisConnectionException: Unable to init enough connections amount! Only 0 of 4 were initialized. Server: 10.83.19.20/10.83.19.20:6379
	at org.redisson.connection.pool.ConnectionPool.createConnection(ConnectionPool.java:104)
	at org.redisson.connection.pool.ConnectionPool.initConnections(ConnectionPool.java:92)
	at org.redisson.connection.pool.ConnectionPool.initConnections(ConnectionPool.java:77)
	at org.redisson.connection.pool.SlaveConnectionPool.initConnections(SlaveConnectionPool.java:30)
	at org.redisson.connection.balancer.LoadBalancerManager.unfreeze(LoadBalancerManager.java:157)
	... 6 common frames omitted

10.83.19.20 - old redis IP, which no longer exists after redis has been restarted by kubernetes. Unfortunately, I have no exact steps to reproduce as it happened only once and in only one our service (all others have been reconnected as well and didn’t throw these exceptions).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Unfreeze failed with redis in kubernetes · Issue #4356 - GitHub
We run a 3-node cluster using bitnami/redis helm chart, version 16.9.2 . We have 3 redis and 3 sentinel nodes. Our relevant configuration...
Read more >
Redis-cli cannot connect from a vnet to a Redis Cluster ...
This is because, the redis cli is connected to the cluster, if the data is in the same node it is connected, it...
Read more >
redis-cluster 7.5.2 · bitnami/bitnami - Artifact Hub
This chart bootstraps a Redis™ deployment on a Kubernetes cluster using the Helm package manager. Bitnami charts can be used with Kubeapps for...
Read more >
Deploy and Operate a Redis Cluster in Kubernetes | by Mark Lu
It seems some node IP in the cluster is not accessible at all and causing random connection issue for the Redis client applications....
Read more >
Connecting to a Redis instance from a Google Kubernetes ...
This sample HTTP server application establishes a connection to a Redis instance from a Google Kubernetes Engine cluster. Clone the repository for your...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found