question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Possibility of broken connections on the pool

See original GitHub issue

Expected behavior

When using a fixed connection pool of min=64 and max=64 to every node (masters and slaves) on cluster config. Redisson is able to open a healthy connection pool of 64 to each of the nodes

Actual behavior

It seems that on some cases when our app container (dockerized app) starts there might be some network/warm up issues (also seen by getting a few CLUSTER_NODES and CLUSTER_INFO timeouts during startup) leads to some connections on the pool to be broken. No issue is easily observed at low traffic after but after increasing a bit the load, it seems some requests to those instances (we deployed 20 instances and 2 ended up like this) fail with timeouts. This doesn’t seem to happen to instances that started and opened all the connections properly at startup but seems this faulty startup instances remain in a broken state and do not recover and/or re-create those broken connections.

On any case, as said, this only happens on those cases during startup. So that’s our primary hypothesis. If any logic can be put into place to deal with potential broken connections on the pool or some monitoring of the pool we can enable or some config we could do differently. Please advice

example timeout:

	at rapid.shaded.org.redisson.command.CommandBatchService$3.run(CommandBatchService.java:675)
	at rapid.shaded.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:682)
	at rapid.shaded.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:757)

Steps to reproduce or test case

Trying to isolate an easy way to reproduce it at the moemnt

Redis version

3.2.8

Redisson version

3.10.4

Redisson configuration

                            "connectTimeout": 10000,
                            "timeout": 100,
                            "retryInterval": 50,
                            "retryAttempts": 4,
                            "masterConnectionMinimumIdleSize": 64,
                            "masterConnectionPoolSize": 64,
                            "slaveConnectionMinimumIdleSize": 64,
                            "slaveConnectionPoolSize": 64,
                            "keepAlive": true,
                            "tcpNoDelay": true,
                            "readMode": "MASTER_SLAVE",
                            "nodeAddresses": [
                                "redis://redis001.prod.local:6329",
                                "redis://redis001.prod.local:6339",
                                "redis://redis001.prod.local:6349",
                                "redis://redis002.prod.local:6329",
                                "redis://redis002.prod.local:6339",
                                "redis://redis002.prod.local:6349",
                                "redis://redis003.prod.local:6329",
                                "redis://redis003.prod.local:6339",
                                "redis://redis003.prod.local:6349",
                                "redis://redis004.prod.local:6329",
                                "redis://redis004.prod.local:6339",
                                "redis://redis004.prod.local:6349",
                                "redis://redis005.prod.local:6329",
                                "redis://redis005.prod.local:6339",
                                "redis://redis005.prod.local:6349",
                                "redis://redis006.prod.local:6329",
                                "redis://redis006.prod.local:6339",
                                "redis://redis006.prod.local:6349"
                            ]
                        },
                        "useLinuxNativeEpoll": true

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:15 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
mrnikocommented, Apr 16, 2019

Did you try to set pingConnectionInterval setting? This would help to avoid broken connections by using redis PING command. Broken connection get reconnected if Redis fail to response.

0reactions
mrnikocommented, Feb 17, 2020
Read more comments on GitHub >

github_iconTop Results From Across the Web

How do connection pools handle broken connections?
Let's say my MySQL server drops all connections e.g. due to a sudden reboot, or just one of the connections is dropped due...
Read more >
Broken Connection Remains in Pool - Google Groups
I'm having an occasional problem where a single connection fails, but is not removed from the pool. From there, that closed connection ......
Read more >
5 Common Causes Of Pool Leaks - - Leak Science
Loose or Broken Fittings. While breaks, cracks, or collapses in the pipe can happen anywhere, they are most likely to be where joint...
Read more >
Troubleshooting connection pooling (J2C) problems in ... - IBM
This section will help you to troubleshoot problems with stale or invalid connections in the connection pool. WebSphere Application Server can ...
Read more >
java - How to remove broken connection object from ...
c3p0 can test the connection in several ways: on check out; on check in; periodically. the configuration is described here: ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found