Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Improvement of JedisClusterInfoCache#renewClusterSlots

See original GitHub issue

Recently I noticed the number of threads of a java application running on production increased significant and then recovered in a short time(~1m), many threads has stacktrace below:

"thrift-worker-715" Id=11815 WAITING on java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@4fb62711 owned by "thrift-worker-718" Id=11818
        at sun.misc.Unsafe.park(Native Method)
        -  waiting on java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@4fb62711
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
        at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
        at redis.clients.jedis.JedisClusterInfoCache.getSlotPool(JedisClusterInfoCache.java:234)
        at redis.clients.jedis.JedisSlotBasedConnectionHandler.getConnectionFromSlot(JedisSlotBasedConnectionHandler.java:62)
        at redis.clients.jedis.JedisClusterConnectionHandlerWraper.getConnectionFromSlot(JedisClusterConnectionHandlerWraper.java:103)
        at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:116)
        at redis.clients.jedis.JedisClusterCommand.run(JedisClusterCommand.java:31)

and thread thrift-worker-718 stacktrace:

"thrift-worker-718" Id=11818 RUNNABLE (in native)
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
        at java.net.SocketInputStream.read(SocketInputStream.java:171)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at java.net.SocketInputStream.read(SocketInputStream.java:127)
        at redis.clients.util.RedisInputStream.ensureFill(RedisInputStream.java:196)
        at redis.clients.util.RedisInputStream.readByte(RedisInputStream.java:40)
        at redis.clients.jedis.Protocol.process(Protocol.java:151)
        at redis.clients.jedis.Protocol.read(Protocol.java:215)
        at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:340)
        at redis.clients.jedis.Connection.getStatusCodeReply(Connection.java:239)
        at redis.clients.jedis.BinaryJedis.quit(BinaryJedis.java:253)
        at redis.clients.jedis.JedisFactory.destroyObject(JedisFactory.java:88)
        at org.apache.commons.pool2.impl.GenericObjectPool.destroy(GenericObjectPool.java:921)
        at org.apache.commons.pool2.impl.GenericObjectPool.invalidateObject(GenericObjectPool.java:626)
        at redis.clients.util.Pool.returnBrokenResourceObject(Pool.java:101)
        at redis.clients.jedis.JedisPool.returnBrokenResource(JedisPool.java:239)
        at redis.clients.jedis.JedisPool.returnBrokenResource(JedisPool.java:16)
        at redis.clients.jedis.Jedis.close(Jedis.java:3407)
        at redis.clients.jedis.JedisClusterInfoCache.renewClusterSlots(JedisClusterInfoCache.java:110)
        at redis.clients.jedis.JedisClusterConnectionHandler.renewSlotCache(JedisClusterConnectionHandler.java:52)
        at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:135)
        at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:141)
        at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:141)
        at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:141)
        at redis.clients.jedis.JedisClusterCommand.run(JedisClusterCommand.java:31)

It’s seem many thread wait a write lock which locked by another thread to be unlock, but i/o operation was pretty slow, client may have network issue with one of redis nodes. The renewClusterSlots operation take 500+ms(connectTimeout 200ms + soTimeout 300ms) in our production. It caused lots of operations timed out.

So I looked into source code of redis.clients.jedis.JedisClusterInfoCache#renewClusterSlots

There are two points I think may improve: 1. reduce lock granularity, move i/o operation out of lock block 2. call renewClusterSlots with explicit exclude redis node which cause IOException (it may have network issue with client or just down)

I would like to open a PR to do this, any thoughts?

Issue Analytics

State:
Created 3 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

sazzad16commented, Apr 23, 2021

Resolved by #2514

0reactions

Shawyeokcommented, Apr 19, 2021

@sazzad16 Could you please take a look about this #2514?

Top Results From Across the Web

Improve cluster nodes and slots information #988 - redis/jedis

JedisCluster currently exposes nodes information through the getClusterNodes method but this returns very limited information about node ...

Using jedis how to write to a specific slot/node in redis cluster

Solution 1: Found a solution to identify the slot to which keys would go into. JedisCluster has some APIs to get it.

Support for pipeline and transactions in JedisCluster for Redis ...

We can use JedisCluster for processing simple operations to Redis Cluster (including ... 4) Get slot by key, get host by slot, get...

JedisCluster (Jedis 3.0.1 API) - javadoc.io

public class JedisCluster extends BinaryJedisCluster implements JedisClusterCommands, ... JedisCluster(HostAndPort node, int timeout, int maxAttempts).

JedisClusterConnection (Spring Data Redis 3.0.0 API)

Retrieve cluster node information such as id, host, port and slots. Retrieve information about connected replicas for given master node.