Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Takes 20 minutes to reconnect to redis

See original GitHub issue

Bug Report

Current Behavior

Upon losing connection to redis it takes around 15 minutes for connection to be re-established. We consistently see the following over and over in the logs until connection is established.

19-10-2020 22:19:34.025 DEBUG r.n.http.server.HttpServerOperations  [id: 0x6ef0bde5, L:/10.0.0.123:8080 - R:/10.0.0.6:45848] New http connection, requesting read 
19-10-2020 22:19:34.025 DEBUG r.netty.channel.BootstrapHandlers  [id: 0x6ef0bde5, L:/10.0.0.123:8080 - R:/10.0.0.6:45848] Initialized pipeline DefaultChannelPipeline{(BootstrapHandlers$BootstrapInitializerHandler#0 = reactor.netty.channel.BootstrapHandlers$BootstrapInitializerHandler), (reactor.left.httpCodec = io.netty.handler.codec.http.HttpServerCodec), (reactor.left.httpTrafficHandler = reactor.netty.http.server.HttpTrafficHandler), (reactor.right.reactiveBridge = reactor.netty.channel.ChannelOperationsHandler)} 
19-10-2020 22:19:34.025 DEBUG r.n.http.server.HttpServerOperations  [id: 0x6ef0bde5, L:/10.0.0.123:8080 - R:/10.0.0.6:45848] Increasing pending responses, now 1 
19-10-2020 22:19:34.025 DEBUG reactor.netty.http.server.HttpServer  [id: 0x6ef0bde5, L:/10.0.0.123:8080 - R:/10.0.0.6:45848] Handler is being applied: org.springframework.http.server.reactive.ReactorHttpHandlerAdapter@7ad3dcc9 
19-10-2020 22:19:34.028 DEBUG r.n.http.server.HttpServerOperations  [id: 0x6ef0bde5, L:/10.0.0.123:8080 - R:/10.0.0.6:45848] Detected non persistent http connection, preparing to close 
19-10-2020 22:19:34.029 DEBUG r.n.http.server.HttpServerOperations  [id: 0x6ef0bde5, L:/10.0.0.123:8080 - R:/10.0.0.6:45848] Last HTTP response frame 
19-10-2020 22:19:34.029 DEBUG r.n.http.server.HttpServerOperations  [id: 0x6ef0bde5, L:/10.0.0.123:8080 - R:/10.0.0.6:45848] Last HTTP packet was sent, terminating the channel 
19-10-2020 22:19:45.907 DEBUG r.n.http.server.HttpServerOperations  [id: 0xe376eb28, L:/10.0.0.123:8080 - R:/10.0.0.101:44736] New http connection, requesting read

After 15 or so minutes we then see

stus2-test.redis.cache.windows.net/52.242.79.64:6380, chid=0x1] Unexpected exception during request: io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection timed out 
io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection timed out
19-10-2020 22:20:48.664 DEBUG i.l.core.protocol.ConnectionWatchdog correlationId=81836eb2-f092-44df-b379-adf0d2390bd1 [channel=0xf2c1750a, /10.0.0.123:42450 -> obapi-redis-cache-eastus2-test.redis.cache.windows.net/52.242.79.64:6380, last known addr=obapi-redis-cache-eastus2-test.redis.cache.windows.net/52.242.79.64:6380] userEventTriggered(ctx, SslCloseCompletionEvent(java.nio.channels.ClosedChannelException)) 
19-10-2020 22:20:48.664 DEBUG i.l.core.protocol.CommandHandler correlationId=81836eb2-f092-44df-b379-adf0d2390bd1 [channel=0xf2c1750a, /10.0.0.123:42450 -> obapi-redis-cache-eastus2-test.redis.cache.windows.net/52.242.79.64:6380, chid=0x1] channelInactive() 
19-10-2020 22:20:48.664 DEBUG i.l.core.protocol.DefaultEndpoint correlationId=81836eb2-f092-44df-b379-adf0d2390bd1 [channel=0xf2c1750a, /10.0.0.123:42450 -> obapi-redis-cache-eastus2-test.redis.cache.windows.net/52.242.79.64:6380, epid=0x1] deactivating endpoint handler 
19-10-2020 22:20:48.665 DEBUG i.l.core.protocol.DefaultEndpoint correlationId=81836eb2-f092-44df-b379-adf0d2390bd1 [channel=0xf2c1750a, /10.0.0.123:42450 -> obapi-redis-cache-eastus2-test.redis.cache.windows.net/52.242.79.64:6380, epid=0x1] notifyQueuedCommands adding 3 command(s) to buffer 
19-10-2020 22:20:48.665 DEBUG i.l.core.protocol.CommandHandler correlationId=81836eb2-f092-44df-b379-adf0d2390bd1 [channel=0xf2c1750a, /10.0.0.123:42450 -> obapi-redis-cache-eastus2-test.redis.cache.windows.net/52.242.79.64:6380, chid=0x1] channelInactive() done 
19-10-2020 22:20:48.665 DEBUG i.l.core.protocol.ConnectionWatchdog correlationId=81836eb2-f092-44df-b379-adf0d2390bd1 [channel=0xf2c1750a, /10.0.0.123:42450 -> obapi-redis-cache-eastus2-test.redis.cache.windows.net/52.242.79.64:6380, last known addr=obapi-redis-cache-eastus2-test.redis.cache.windows.net/52.242.79.64:6380] channelInactive() 
19-10-2020 22:20:48.665 DEBUG i.l.core.protocol.ConnectionWatchdog correlationId=81836eb2-f092-44df-b379-adf0d2390bd1 [channel=0xf2c1750a, /10.0.0.123:42450 -> obapi-redis-cache-eastus2-test.redis.cache.windows.net/52.242.79.64:6380, last known addr=obapi-redis-cache-eastus2-test.redis.cache.windows.net/52.242.79.64:6380] scheduleReconnect()

Expected behavior/code

We expect the reconnect to happen in a minute or less.

Note, we only see this behavior in Azure K8. Using same docker image (i.e. as one deployed in cloud) run locally (on our macs) we see reconnect happening in less than 1 minute.

Environment

Lettuce version(s): Lettuce 5.3.1.RELEASE

Issue Analytics

State:
Created 3 years ago
Comments:14 (6 by maintainers)

Top GitHub Comments

1reaction

before30commented, Oct 14, 2021

I’m facing the same issue especially when connecting to other data centers.

And I found gRPC has the same issues. gRPC make a kind of self health check for connection and that make a new connection when ping fails. https://cs.mcgill.ca/~mxia3/2019/02/23/Using-gRPC-in-Production/

I tried these code for health check and it works well.

LettuceConnectionFactory connectionFactory = (LettuceConnectionFactory)redisTemplate.getConnectionFactory(); try { connectionFactory.getConnection().ping(); } catch (RedisCommandTimeoutException ex) { connectionFactory.resetConnection(); }

1reaction

nitin-patil1commented, Aug 27, 2021

do we have any solution?