question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ConnectionWatchdog tries to reconnect to the node's previous IP

See original GitHub issue

Bug Report

Lettuce’ ConnectionWatchdog keeps trying to connect to old IPs of Redis nodes.

Current Behavior

  • Cluster with 6 nodes (3 master nodes, each with one replica), running under Kubernetes
  • Redis nodes are restarted one by one, each getting a new IP address
  • The ConnectionWatchog doesn’t seem to use this information to stop connecting to the previous IPs.

RoundRobinSocketAddressSupplier says that the IP Address of 3da56d06b4a34c20ee560d3ed28a2679ba089a30 is 10.6.21.237, but RedisStateMachine has it is as 10.6.37.76.

Log messages
09:08:44.918    DEBUG           RedisStateMachine        Decoded LatencyMeteredCommand [type=CLUSTER, output=StatusOutput [output=
ff06a774b13ad88a63b39dcd0c9325caf3e4fa16 10.6.34.87:6379@16379 master - 0 1594717721000 90 connected 10923-16383
72641dc0aa44972fd1ffd31ad1342cbcfd01b4fb 10.6.26.31:6379@16379 myself,slave ff06a774b13ad88a63b39dcd0c9325caf3e4fa16 0 1594717722000 84 connected
9bc0e2cb46f424cac90d8816485d1a2728919765 10.6.13.141:6379@16379 slave b4cb4eb98ca653e888d3b2ec931898ab7c97b867 0 1594717723043 92 connected
b4cb4eb98ca653e888d3b2ec931898ab7c97b867 10.6.21.140:6379@16379 master - 0 1594717722039 92 connected 0-5460
fbc6db4e984a9142efd0c5c7ab01b2f21abb0787 10.6.11.155:6379@16379 master - 0 1594717721036 86 connected 5461-10922
3da56d06b4a34c20ee560d3ed28a2679ba089a30 10.6.37.76:6379@16379 slave fbc6db4e984a9142efd0c5c7ab01b2f21abb0787 0 1594717724047 86 connected
, error='null'], commandType=io.lettuce.core.cluster.topology.TimedAsyncCommand], empty stack: true

09:09:08.805    DEBUG           RoundRobinSocketAddressSupplier                Resolved SocketAddress 10.6.21.237:6379 using for Cluster node 3da56d06b4a34c20ee560d3ed28a2679ba089a30
09:09:08.806    DEBUG           ReconnectionHandler      Reconnecting to Redis at 10.6.21.237:6379

09:09:08.844    DEBUG           RedisStateMachine         Decoded LatencyMeteredCommand [type=CLUSTER, output=StatusOutput [output=
ff06a774b13ad88a63b39dcd0c9325caf3e4fa16 10.6.34.87:6379@16379 master - 0 1594717747000 90 connected 10923-16383
9bc0e2cb46f424cac90d8816485d1a2728919765 10.6.13.141:6379@16379 slave b4cb4eb98ca653e888d3b2ec931898ab7c97b867 0 1594717745384 92 connected
3da56d06b4a34c20ee560d3ed28a2679ba089a30 10.6.37.76:6379@16379 slave fbc6db4e984a9142efd0c5c7ab01b2f21abb0787 0 1594717747390 86 connected
fbc6db4e984a9142efd0c5c7ab01b2f21abb0787 10.6.11.155:6379@16379 myself,master - 0 1594717744000 86 connected 5461-10922
b4cb4eb98ca653e888d3b2ec931898ab7c97b867 10.6.21.140:6379@16379 master - 0 1594717748401 92 connected 0-5460
72641dc0aa44972fd1ffd31ad1342cbcfd01b4fb 10.6.26.31:6379@16379 slave ff06a774b13ad88a63b39dcd0c9325caf3e4fa16 0 1594717747000 90 connected
, error='null'], commandType=io.lettuce.core.cluster.topology.TimedAsyncCommand], empty stack: true

09:09:18.824    DEBUG           ConnectionWatchdog      [channel=0x03ba1129, /10.6.13.138:40214 -> /10.6.21.237:6379, last known addr=/10.6.21.237:6379] scheduleReconnect()

09:09:18.824    DEBUG           ConnectionWatchdog       Cannot reconnect to [10.6.21.237:6379]: connection timed out: /10.6.21.237:6379 io.netty.channel.ConnectTimeoutException: connection timed out: /10.6.21.237:6379
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:261)
        at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
        at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:834)

09:09:18.825    DEBUG           ConnectionWatchdog        [channel=0x03ba1129, /10.6.13.138:40214 -> /10.6.21.237:6379, last known addr=/10.6.21.237:6379] Reconnect attempt 61, delay 30000ms

Lettuce Configuration

Relevant Spring Boot Configuration
spring.redis.lettuce.cluster.refresh.adaptive=true
spring.redis.lettuce.cluster.refresh.period=1M

custom.redis.lettuce.cluster.refresh.dynamic-sources=false (dynamicRefreshSources)

spring.redis.cluster.nodes=\
  redis-cluster-0.redis-cluster:6379,\
  redis-cluster-1.redis-cluster:6379,\
  redis-cluster-2.redis-cluster:6379,\
  redis-cluster-3.redis-cluster:6379,\
  redis-cluster-4.redis-cluster:6379,\
  redis-cluster-5.redis-cluster:6379

Expected behavior/code

ConnectionWatchdog should stop trying to connect to a previous IP address of a Redis node (which is now known to have another IP address).

Environment

  • Lettuce version(s): 5.3.1.RELEASE
  • Redis version: 6.0.1
  • Spring Boot: 2.3.1

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
mp911decommented, Jul 17, 2020

All DNS resolution is handled by SocketAddressResolver and DnsResolver. After looking into RoundRobinSocketAddressSupplier, it seems that DNS resolution isn’t involved at all as RoundRobinSocketAddressSupplier is based on the initial Partitions object. Whenever a reconnect occurs, RoundRobinSocketAddressSupplier is asked to provide a new endpoint to connect to. If the Partitions change in between calls to RoundRobinSocketAddressSupplier.get(), RoundRobin is rebuilt.

For some reason, this doesn’t happen in your case. If you are able to reproduce the issue, please step into RoundRobinSocketAddressSupplier.get() to capture the state of partitions and the inner state of RoundRobin to see where the mismatch stems from.

0reactions
1zg12commented, May 12, 2022

I am facing the same exception as well, with almost all default configuration with lettuce. This only happens with a larger data set.

These are the configs:

redis:
  lettuce:
   pool:
     max-active: 10
     enabled: true

this works with smaller data set, and always fail on large date set.

Read more comments on GitHub >

github_iconTop Results From Across the Web

lettuce-io/Lobby - Gitter
Now when I restart the Redis cluster and each node changed it's IP address I see that client tries to reconnect to the...
Read more >
Spring boot always try to reconnect the failed node in Redis ...
I have a redis cluster with 3 shards. Each shard has 2 nodes, 1 primary and 1 replica. I'm using spring-boot 2.0.1.
Read more >
Lettuce Reference Guide
In this section, we try to provide what we think is an easy-to-follow guide for starting with Lettuce. However, if you encounter issues...
Read more >
How to Connect to an Ethernet Device for Communication
The first node address of a subnet (0) is the network ID and used to identify the subnet itself, while the last node...
Read more >
Retrieve the cluster public key and cluster node IP addresses
You will use the IP addresses in Step 3 to configure the host to accept the connection from Amazon Redshift. Depending on what...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found