Takes 20 minutes to reconnect to redis
See original GitHub issueBug Report
Current Behavior
Upon losing connection to redis it takes around 15 minutes for connection to be re-established. We consistently see the following over and over in the logs until connection is established.
19-10-2020 22:19:34.025 DEBUG r.n.http.server.HttpServerOperations [id: 0x6ef0bde5, L:/10.0.0.123:8080 - R:/10.0.0.6:45848] New http connection, requesting read
19-10-2020 22:19:34.025 DEBUG r.netty.channel.BootstrapHandlers [id: 0x6ef0bde5, L:/10.0.0.123:8080 - R:/10.0.0.6:45848] Initialized pipeline DefaultChannelPipeline{(BootstrapHandlers$BootstrapInitializerHandler#0 = reactor.netty.channel.BootstrapHandlers$BootstrapInitializerHandler), (reactor.left.httpCodec = io.netty.handler.codec.http.HttpServerCodec), (reactor.left.httpTrafficHandler = reactor.netty.http.server.HttpTrafficHandler), (reactor.right.reactiveBridge = reactor.netty.channel.ChannelOperationsHandler)}
19-10-2020 22:19:34.025 DEBUG r.n.http.server.HttpServerOperations [id: 0x6ef0bde5, L:/10.0.0.123:8080 - R:/10.0.0.6:45848] Increasing pending responses, now 1
19-10-2020 22:19:34.025 DEBUG reactor.netty.http.server.HttpServer [id: 0x6ef0bde5, L:/10.0.0.123:8080 - R:/10.0.0.6:45848] Handler is being applied: org.springframework.http.server.reactive.ReactorHttpHandlerAdapter@7ad3dcc9
19-10-2020 22:19:34.028 DEBUG r.n.http.server.HttpServerOperations [id: 0x6ef0bde5, L:/10.0.0.123:8080 - R:/10.0.0.6:45848] Detected non persistent http connection, preparing to close
19-10-2020 22:19:34.029 DEBUG r.n.http.server.HttpServerOperations [id: 0x6ef0bde5, L:/10.0.0.123:8080 - R:/10.0.0.6:45848] Last HTTP response frame
19-10-2020 22:19:34.029 DEBUG r.n.http.server.HttpServerOperations [id: 0x6ef0bde5, L:/10.0.0.123:8080 - R:/10.0.0.6:45848] Last HTTP packet was sent, terminating the channel
19-10-2020 22:19:45.907 DEBUG r.n.http.server.HttpServerOperations [id: 0xe376eb28, L:/10.0.0.123:8080 - R:/10.0.0.101:44736] New http connection, requesting read
After 15 or so minutes we then see
stus2-test.redis.cache.windows.net/52.242.79.64:6380, chid=0x1] Unexpected exception during request: io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection timed out
io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection timed out
19-10-2020 22:20:48.664 DEBUG i.l.core.protocol.ConnectionWatchdog correlationId=81836eb2-f092-44df-b379-adf0d2390bd1 [channel=0xf2c1750a, /10.0.0.123:42450 -> obapi-redis-cache-eastus2-test.redis.cache.windows.net/52.242.79.64:6380, last known addr=obapi-redis-cache-eastus2-test.redis.cache.windows.net/52.242.79.64:6380] userEventTriggered(ctx, SslCloseCompletionEvent(java.nio.channels.ClosedChannelException))
19-10-2020 22:20:48.664 DEBUG i.l.core.protocol.CommandHandler correlationId=81836eb2-f092-44df-b379-adf0d2390bd1 [channel=0xf2c1750a, /10.0.0.123:42450 -> obapi-redis-cache-eastus2-test.redis.cache.windows.net/52.242.79.64:6380, chid=0x1] channelInactive()
19-10-2020 22:20:48.664 DEBUG i.l.core.protocol.DefaultEndpoint correlationId=81836eb2-f092-44df-b379-adf0d2390bd1 [channel=0xf2c1750a, /10.0.0.123:42450 -> obapi-redis-cache-eastus2-test.redis.cache.windows.net/52.242.79.64:6380, epid=0x1] deactivating endpoint handler
19-10-2020 22:20:48.665 DEBUG i.l.core.protocol.DefaultEndpoint correlationId=81836eb2-f092-44df-b379-adf0d2390bd1 [channel=0xf2c1750a, /10.0.0.123:42450 -> obapi-redis-cache-eastus2-test.redis.cache.windows.net/52.242.79.64:6380, epid=0x1] notifyQueuedCommands adding 3 command(s) to buffer
19-10-2020 22:20:48.665 DEBUG i.l.core.protocol.CommandHandler correlationId=81836eb2-f092-44df-b379-adf0d2390bd1 [channel=0xf2c1750a, /10.0.0.123:42450 -> obapi-redis-cache-eastus2-test.redis.cache.windows.net/52.242.79.64:6380, chid=0x1] channelInactive() done
19-10-2020 22:20:48.665 DEBUG i.l.core.protocol.ConnectionWatchdog correlationId=81836eb2-f092-44df-b379-adf0d2390bd1 [channel=0xf2c1750a, /10.0.0.123:42450 -> obapi-redis-cache-eastus2-test.redis.cache.windows.net/52.242.79.64:6380, last known addr=obapi-redis-cache-eastus2-test.redis.cache.windows.net/52.242.79.64:6380] channelInactive()
19-10-2020 22:20:48.665 DEBUG i.l.core.protocol.ConnectionWatchdog correlationId=81836eb2-f092-44df-b379-adf0d2390bd1 [channel=0xf2c1750a, /10.0.0.123:42450 -> obapi-redis-cache-eastus2-test.redis.cache.windows.net/52.242.79.64:6380, last known addr=obapi-redis-cache-eastus2-test.redis.cache.windows.net/52.242.79.64:6380] scheduleReconnect()
Expected behavior/code
We expect the reconnect to happen in a minute or less.
Note, we only see this behavior in Azure K8. Using same docker image (i.e. as one deployed in cloud) run locally (on our macs) we see reconnect happening in less than 1 minute.
Environment
- Lettuce version(s): Lettuce 5.3.1.RELEASE
Issue Analytics
- State:
- Created 3 years ago
- Comments:14 (6 by maintainers)
Top Results From Across the Web
How to reconnect redis client after redis server reboot/scale
It takes up to 15min to reconnect with redis server from azure app service, however if I restart app service as soon as...
Read more >Redis client handling
Redis accepts clients connections on the configured TCP port and on the Unix ... client is idle for many seconds: the connection will...
Read more >Troubleshoot connecting to an ElastiCache for Redis cluster
I can't connect to my Amazon ElastiCache for Redis cluster. How can I troubleshoot ... Wait a few minutes until it updates to...
Read more >Best practices for connection resilience - Azure Cache for Redis
The connect timeout is the time your client waits to establish a connection with Redis server. Configure your client library to use a...
Read more >Redis slave keeps disconnecting during syn with master or ...
what we suspect is as redis slave is taking around two minutes to load received rdb file. meanwhile other master thinks its unreachable...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’m facing the same issue especially when connecting to other data centers.
And I found gRPC has the same issues. gRPC make a kind of self health check for connection and that make a new connection when ping fails. https://cs.mcgill.ca/~mxia3/2019/02/23/Using-gRPC-in-Production/
I tried these code for health check and it works well.
do we have any solution?