Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Possibility of broken connections on the pool

See original GitHub issue

Expected behavior

When using a fixed connection pool of min=64 and max=64 to every node (masters and slaves) on cluster config. Redisson is able to open a healthy connection pool of 64 to each of the nodes

Actual behavior

It seems that on some cases when our app container (dockerized app) starts there might be some network/warm up issues (also seen by getting a few CLUSTER_NODES and CLUSTER_INFO timeouts during startup) leads to some connections on the pool to be broken. No issue is easily observed at low traffic after but after increasing a bit the load, it seems some requests to those instances (we deployed 20 instances and 2 ended up like this) fail with timeouts. This doesn’t seem to happen to instances that started and opened all the connections properly at startup but seems this faulty startup instances remain in a broken state and do not recover and/or re-create those broken connections.

On any case, as said, this only happens on those cases during startup. So that’s our primary hypothesis. If any logic can be put into place to deal with potential broken connections on the pool or some monitoring of the pool we can enable or some config we could do differently. Please advice

example timeout:

	at rapid.shaded.org.redisson.command.CommandBatchService$3.run(CommandBatchService.java:675)
	at rapid.shaded.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:682)
	at rapid.shaded.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:757)

Steps to reproduce or test case

Trying to isolate an easy way to reproduce it at the moemnt

Redis version

3.2.8

Redisson version

3.10.4

Redisson configuration

                            "connectTimeout": 10000,
                            "timeout": 100,
                            "retryInterval": 50,
                            "retryAttempts": 4,
                            "masterConnectionMinimumIdleSize": 64,
                            "masterConnectionPoolSize": 64,
                            "slaveConnectionMinimumIdleSize": 64,
                            "slaveConnectionPoolSize": 64,
                            "keepAlive": true,
                            "tcpNoDelay": true,
                            "readMode": "MASTER_SLAVE",
                            "nodeAddresses": [
                                "redis://redis001.prod.local:6329",
                                "redis://redis001.prod.local:6339",
                                "redis://redis001.prod.local:6349",
                                "redis://redis002.prod.local:6329",
                                "redis://redis002.prod.local:6339",
                                "redis://redis002.prod.local:6349",
                                "redis://redis003.prod.local:6329",
                                "redis://redis003.prod.local:6339",
                                "redis://redis003.prod.local:6349",
                                "redis://redis004.prod.local:6329",
                                "redis://redis004.prod.local:6339",
                                "redis://redis004.prod.local:6349",
                                "redis://redis005.prod.local:6329",
                                "redis://redis005.prod.local:6339",
                                "redis://redis005.prod.local:6349",
                                "redis://redis006.prod.local:6329",
                                "redis://redis006.prod.local:6339",
                                "redis://redis006.prod.local:6349"
                            ]
                        },
                        "useLinuxNativeEpoll": true

Issue Analytics

State:
Created 4 years ago
Comments:15 (7 by maintainers)

Top GitHub Comments

1reaction

mrnikocommented, Apr 16, 2019

Did you try to set pingConnectionInterval setting? This would help to avoid broken connections by using redis PING command. Broken connection get reconnected if Redis fail to response.

0reactions

mrnikocommented, Feb 17, 2020

Fixed in https://github.com/redisson/redisson/issues/2587