question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Takes 20 minutes to reconnect to redis

See original GitHub issue

Bug Report

Current Behavior

Upon losing connection to redis it takes around 15 minutes for connection to be re-established. We consistently see the following over and over in the logs until connection is established.
19-10-2020 22:19:34.025 DEBUG r.n.http.server.HttpServerOperations  [id: 0x6ef0bde5, L:/10.0.0.123:8080 - R:/10.0.0.6:45848] New http connection, requesting read 
19-10-2020 22:19:34.025 DEBUG r.netty.channel.BootstrapHandlers  [id: 0x6ef0bde5, L:/10.0.0.123:8080 - R:/10.0.0.6:45848] Initialized pipeline DefaultChannelPipeline{(BootstrapHandlers$BootstrapInitializerHandler#0 = reactor.netty.channel.BootstrapHandlers$BootstrapInitializerHandler), (reactor.left.httpCodec = io.netty.handler.codec.http.HttpServerCodec), (reactor.left.httpTrafficHandler = reactor.netty.http.server.HttpTrafficHandler), (reactor.right.reactiveBridge = reactor.netty.channel.ChannelOperationsHandler)} 
19-10-2020 22:19:34.025 DEBUG r.n.http.server.HttpServerOperations  [id: 0x6ef0bde5, L:/10.0.0.123:8080 - R:/10.0.0.6:45848] Increasing pending responses, now 1 
19-10-2020 22:19:34.025 DEBUG reactor.netty.http.server.HttpServer  [id: 0x6ef0bde5, L:/10.0.0.123:8080 - R:/10.0.0.6:45848] Handler is being applied: org.springframework.http.server.reactive.ReactorHttpHandlerAdapter@7ad3dcc9 
19-10-2020 22:19:34.028 DEBUG r.n.http.server.HttpServerOperations  [id: 0x6ef0bde5, L:/10.0.0.123:8080 - R:/10.0.0.6:45848] Detected non persistent http connection, preparing to close 
19-10-2020 22:19:34.029 DEBUG r.n.http.server.HttpServerOperations  [id: 0x6ef0bde5, L:/10.0.0.123:8080 - R:/10.0.0.6:45848] Last HTTP response frame 
19-10-2020 22:19:34.029 DEBUG r.n.http.server.HttpServerOperations  [id: 0x6ef0bde5, L:/10.0.0.123:8080 - R:/10.0.0.6:45848] Last HTTP packet was sent, terminating the channel 
19-10-2020 22:19:45.907 DEBUG r.n.http.server.HttpServerOperations  [id: 0xe376eb28, L:/10.0.0.123:8080 - R:/10.0.0.101:44736] New http connection, requesting read 

After 15 or so minutes we then see

stus2-test.redis.cache.windows.net/52.242.79.64:6380, chid=0x1] Unexpected exception during request: io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection timed out 
io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection timed out
19-10-2020 22:20:48.664 DEBUG i.l.core.protocol.ConnectionWatchdog correlationId=81836eb2-f092-44df-b379-adf0d2390bd1 [channel=0xf2c1750a, /10.0.0.123:42450 -> obapi-redis-cache-eastus2-test.redis.cache.windows.net/52.242.79.64:6380, last known addr=obapi-redis-cache-eastus2-test.redis.cache.windows.net/52.242.79.64:6380] userEventTriggered(ctx, SslCloseCompletionEvent(java.nio.channels.ClosedChannelException)) 
19-10-2020 22:20:48.664 DEBUG i.l.core.protocol.CommandHandler correlationId=81836eb2-f092-44df-b379-adf0d2390bd1 [channel=0xf2c1750a, /10.0.0.123:42450 -> obapi-redis-cache-eastus2-test.redis.cache.windows.net/52.242.79.64:6380, chid=0x1] channelInactive() 
19-10-2020 22:20:48.664 DEBUG i.l.core.protocol.DefaultEndpoint correlationId=81836eb2-f092-44df-b379-adf0d2390bd1 [channel=0xf2c1750a, /10.0.0.123:42450 -> obapi-redis-cache-eastus2-test.redis.cache.windows.net/52.242.79.64:6380, epid=0x1] deactivating endpoint handler 
19-10-2020 22:20:48.665 DEBUG i.l.core.protocol.DefaultEndpoint correlationId=81836eb2-f092-44df-b379-adf0d2390bd1 [channel=0xf2c1750a, /10.0.0.123:42450 -> obapi-redis-cache-eastus2-test.redis.cache.windows.net/52.242.79.64:6380, epid=0x1] notifyQueuedCommands adding 3 command(s) to buffer 
19-10-2020 22:20:48.665 DEBUG i.l.core.protocol.CommandHandler correlationId=81836eb2-f092-44df-b379-adf0d2390bd1 [channel=0xf2c1750a, /10.0.0.123:42450 -> obapi-redis-cache-eastus2-test.redis.cache.windows.net/52.242.79.64:6380, chid=0x1] channelInactive() done 
19-10-2020 22:20:48.665 DEBUG i.l.core.protocol.ConnectionWatchdog correlationId=81836eb2-f092-44df-b379-adf0d2390bd1 [channel=0xf2c1750a, /10.0.0.123:42450 -> obapi-redis-cache-eastus2-test.redis.cache.windows.net/52.242.79.64:6380, last known addr=obapi-redis-cache-eastus2-test.redis.cache.windows.net/52.242.79.64:6380] channelInactive() 
19-10-2020 22:20:48.665 DEBUG i.l.core.protocol.ConnectionWatchdog correlationId=81836eb2-f092-44df-b379-adf0d2390bd1 [channel=0xf2c1750a, /10.0.0.123:42450 -> obapi-redis-cache-eastus2-test.redis.cache.windows.net/52.242.79.64:6380, last known addr=obapi-redis-cache-eastus2-test.redis.cache.windows.net/52.242.79.64:6380] scheduleReconnect() 

Expected behavior/code

We expect the reconnect to happen in a minute or less.

Note, we only see this behavior in Azure K8. Using same docker image (i.e. as one deployed in cloud) run locally (on our macs) we see reconnect happening in less than 1 minute.

Environment

  • Lettuce version(s): Lettuce 5.3.1.RELEASE

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:14 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
before30commented, Oct 14, 2021

I’m facing the same issue especially when connecting to other data centers.

And I found gRPC has the same issues. gRPC make a kind of self health check for connection and that make a new connection when ping fails. https://cs.mcgill.ca/~mxia3/2019/02/23/Using-gRPC-in-Production/

I tried these code for health check and it works well.

LettuceConnectionFactory connectionFactory = (LettuceConnectionFactory)redisTemplate.getConnectionFactory(); try { connectionFactory.getConnection().ping(); } catch (RedisCommandTimeoutException ex) { connectionFactory.resetConnection(); }

1reaction
nitin-patil1commented, Aug 27, 2021

do we have any solution?

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to reconnect redis client after redis server reboot/scale
It takes up to 15min to reconnect with redis server from azure app service, however if I restart app service as soon as...
Read more >
Redis client handling
Redis accepts clients connections on the configured TCP port and on the Unix ... client is idle for many seconds: the connection will...
Read more >
Troubleshoot connecting to an ElastiCache for Redis cluster
I can't connect to my Amazon ElastiCache for Redis cluster. How can I troubleshoot ... Wait a few minutes until it updates to...
Read more >
Best practices for connection resilience - Azure Cache for Redis
The connect timeout is the time your client waits to establish a connection with Redis server. Configure your client library to use a...
Read more >
Redis slave keeps disconnecting during syn with master or ...
what we suspect is as redis slave is taking around two minutes to load received rdb file. meanwhile other master thinks its unreachable...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found