question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

broken connections on the pool

See original GitHub issue

Relate to https://github.com/redisson/redisson/issues/2043

Expected behavior

When using a fixed connection pool of min=64 and max=64 to every node (masters and slaves) on cluster config. Redisson is able to open a healthy connection pool of 64 to each of the nodes

Actual behavior

Connection never recover

Redis version

4.0.8 and use utils/create-cluster to create local cluster

Redission version

3.11.5

Reproduce step

Redisson configuration

		connectTimeout = 500
		timeout = 100
		masterConnectionMinimumIdleSize = 1 <== just to make it easier to reproduce.
		masterConnectionPoolSize = 1
		slaveConnectionMinimumIdleSize = 1
		slaveConnectionPoolSize = 1
		retryAttempts = 3
		retryInterval = 25
		keepAlive = true
		tcpNoDelay = true
		readMode = MASTER_SLAVE
		pingConnectionInterval = 2000
		nodeAddresses = [
							"redis://127.0.0.1:30001",
							"redis://127.0.0.1:30002",
							"redis://127.0.0.1:30003",
							"redis://127.0.0.1:30004",
							"redis://127.0.0.1:30005",
							"redis://127.0.0.1:30006"
		]
					}

and request controller, each HTTP GET request will fetch 100 keys from Redis cluster in batch mode.

	@GetMapping(path = "/get/{id}", produces = MediaType.APPLICATION_JSON_VALUE)
	public CompletionStage<Map<String, String>> get(@PathVariable("id") int id) {

		Set<String> req = new HashSet<>();
		for (int i = 0; i< 100; i++) {
			req.add("test"+ (id * 100  + i));
		}
		CompletionStage<Map<String, String>> test = myRedisClusterWithPrefix.getAllAsync(req);
		test.thenAccept(mm -> {
			//System.out.println("return "+ mm);
		});
		return test;
	}

stress test 100 concurrency ab -n 10000 -c 100 http://localhost:8080/rapid-samples-minimal-spring-boot/get/5

just after running about 20 seconds and stop the stress test tool.

then I call curl http://localhost:8080/rapid-samples-minimal-spring-boot/get/5 again it will already return an error and never recover. error log is

17:34:10.813 [DEBUG] r.s.o.r.c.ClusterConnectionManager - slot 3699 for api:rapid:test:test560
17:34:10.813 [DEBUG] r.s.o.r.c.RedisExecutor - connection released for command null and params null from slot NodeSource [slot=null, addr=null, redisClient=null, redirect=null, entry=MasterSlaveEntry [masterEntry=[freeSubscribeConnectionsAmount=1, freeSubscribeConnectionsCounter=value:50:queue:0, freeConnectionsAmount=1, freeConnectionsCounter=value:1:queue:0, freezed=false, freezeReason=null, client=[addr=redis://127.0.0.1:30004], nodeType=MASTER, firstFail=0]]] using connection RedisConnection@9214847 [redisClient=[addr=redis://127.0.0.1:30004], channel=[id: 0x25f4bc57, L:/127.0.0.1:53318 - R:127.0.0.1/127.0.0.1:30004], currentCommand=null]
17:34:10.846 [DEBUG] r.s.o.r.c.RedisExecutor - attempt 1 for command null and params null
17:34:10.846 [DEBUG] r.s.o.r.c.RedisExecutor - attempt 1 for command null and params null
17:34:10.881 [DEBUG] r.s.o.r.c.RedisExecutor - attempt 2 for command null and params null
17:34:10.881 [DEBUG] r.s.o.r.c.RedisExecutor - attempt 2 for command null and params null
17:34:10.916 [DEBUG] r.s.o.r.c.RedisExecutor - attempt 3 for command null and params null
17:34:10.917 [DEBUG] r.s.o.r.c.RedisExecutor - attempt 3 for command null and params null
17:34:10.957 [ERROR] c.r.r.e.h.RapidSpringErrorHandler - Unable to process unhandled exception
rapid.shaded.org.redisson.client.RedisTimeoutException: Unable to acquire connection! Increase connection pool size and/or retryIntervalNode source: NodeSource [slot=null, addr=null, redisClient=null, redirect=null, entry=MasterSlaveEntry [masterEntry=[freeSubscribeConnectionsAmount=1, freeSubscribeConnectionsCounter=value:50:queue:0, freeConnectionsAmount=1, freeConnectionsCounter=value:1:queue:0, freezed=false, freezeReason=null, client=[addr=redis://127.0.0.1:30005], nodeType=MASTER, firstFail=0]]], command: null, params: null after 0 retry attempts
	at rapid.shaded.org.redisson.command.RedisExecutor$2.run(RedisExecutor.java:191)
	at rapid.shaded.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:680)
	at rapid.shaded.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:755)
	at rapid.shaded.io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:483)
	at rapid.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:834)

Connection status, still connected.

~ » lsof -P -p `jps |grep Application |awk '{print $1}'` |grep IPv     |grep 3000                                                         yinchin.chen@P49945
java    14825 yinchin.chen  237u     IPv6 0x3dec6b4ca72da4d1        0t0      TCP localhost:53301->localhost:30001 (ESTABLISHED)
java    14825 yinchin.chen  239u     IPv6 0x3dec6b4ca72dcd11        0t0      TCP localhost:53302->localhost:30006 (ESTABLISHED)
java    14825 yinchin.chen  241u     IPv6 0x3dec6b4ca72db051        0t0      TCP localhost:53303->localhost:30004 (ESTABLISHED)
java    14825 yinchin.chen  243u     IPv6 0x3dec6b4ca72dd2d1        0t0      TCP localhost:53304->localhost:30005 (ESTABLISHED)
java    14825 yinchin.chen  245u     IPv6 0x3dec6b4ca8fb3611        0t0      TCP localhost:53305->localhost:30002 (ESTABLISHED)
java    14825 yinchin.chen  246u     IPv6 0x3dec6b4ca8fb3051        0t0      TCP localhost:53306->localhost:30003 (ESTABLISHED)
java    14825 yinchin.chen  247u     IPv6 0x3dec6b4ca8fb3bd1        0t0      TCP localhost:53309->localhost:30001 (ESTABLISHED)
java    14825 yinchin.chen  248u     IPv6 0x3dec6b4ca8fb4191        0t0      TCP localhost:53311->localhost:30002 (ESTABLISHED)
java    14825 yinchin.chen  249u     IPv6 0x3dec6b4ca8fb4751        0t0      TCP localhost:53307->localhost:30001 (ESTABLISHED)
java    14825 yinchin.chen  250u     IPv6 0x3dec6b4ca8fb2a91        0t0      TCP localhost:53310->localhost:30003 (ESTABLISHED)
java    14825 yinchin.chen  253u     IPv6 0x3dec6b4ca8fb24d1        0t0      TCP localhost:53314->localhost:30005 (ESTABLISHED)
java    14825 yinchin.chen  254u     IPv6 0x3dec6b4ca8fb4d11        0t0      TCP localhost:53315->localhost:30006 (ESTABLISHED)
java    14825 yinchin.chen  255u     IPv6 0x3dec6b4ca8fb1f11        0t0      TCP localhost:53308->localhost:30005 (ESTABLISHED)
java    14825 yinchin.chen  256u     IPv6 0x3dec6b4ca8fb1951        0t0      TCP localhost:53313->localhost:30004 (ESTABLISHED)
java    14825 yinchin.chen  260u     IPv6 0x3dec6b4ca8fb52d1        0t0      TCP localhost:53316->localhost:30006 (ESTABLISHED)
java    14825 yinchin.chen  261u     IPv6 0x3dec6b4ca9512611        0t0      TCP localhost:53312->localhost:30004 (ESTABLISHED)
java    14825 yinchin.chen  277u     IPv6 0x3dec6b4ca9513751        0t0      TCP localhost:53318->localhost:30004 (ESTABLISHED)
java    14825 yinchin.chen  279u     IPv6 0x3dec6b4ca9513d11        0t0      TCP localhost:53319->localhost:30006 (ESTABLISHED)
java    14825 yinchin.chen  380u     IPv6 0x3dec6b4cabc0da91        0t0      TCP localhost:53419->localhost:30005 (ESTABLISHED)

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:13 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
mrnikocommented, Feb 14, 2020

I reproduced the issue. It affects only RBatch objects in cluster mode. Thanks for test case

1reaction
mrnikocommented, Feb 17, 2020

Fixed!

Read more comments on GitHub >

github_iconTop Results From Across the Web

How do connection pools handle broken connections?
This makes solving the problem perfectly impossible. The best approach, then, is to support retries. Essentially, be willing to have a ...
Read more >
Possibility of broken connections on the pool #2043 - GitHub
This would help to avoid broken connections by using redis PING command. Broken connection get reconnected if Redis fail to response. 1
Read more >
Broken Connection Remains in Pool - Google Groups
I'm having an occasional problem where a single connection fails, but is not removed from the pool. From there, that closed connection ......
Read more >
dataSourcePool close broken connection when release to pool
I'm use Hikari jdbc connection pool. when execute statement witch produce an exception ( for example by network is broken). as following: try{ ......
Read more >
Broken Connections, Inc.
Broken Connections is a community based, non-profit organization providing shelter, food, clothing, and supportive services to needy individuals and ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found