question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Improvement of JedisClusterInfoCache#renewClusterSlots

See original GitHub issue

Recently I noticed the number of threads of a java application running on production increased significant and then recovered in a short time(~1m), many threads has stacktrace below:

"thrift-worker-715" Id=11815 WAITING on java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@4fb62711 owned by "thrift-worker-718" Id=11818
        at sun.misc.Unsafe.park(Native Method)
        -  waiting on java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@4fb62711
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
        at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
        at redis.clients.jedis.JedisClusterInfoCache.getSlotPool(JedisClusterInfoCache.java:234)
        at redis.clients.jedis.JedisSlotBasedConnectionHandler.getConnectionFromSlot(JedisSlotBasedConnectionHandler.java:62)
        at redis.clients.jedis.JedisClusterConnectionHandlerWraper.getConnectionFromSlot(JedisClusterConnectionHandlerWraper.java:103)
        at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:116)
        at redis.clients.jedis.JedisClusterCommand.run(JedisClusterCommand.java:31)

and thread thrift-worker-718 stacktrace:

"thrift-worker-718" Id=11818 RUNNABLE (in native)
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
        at java.net.SocketInputStream.read(SocketInputStream.java:171)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at java.net.SocketInputStream.read(SocketInputStream.java:127)
        at redis.clients.util.RedisInputStream.ensureFill(RedisInputStream.java:196)
        at redis.clients.util.RedisInputStream.readByte(RedisInputStream.java:40)
        at redis.clients.jedis.Protocol.process(Protocol.java:151)
        at redis.clients.jedis.Protocol.read(Protocol.java:215)
        at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:340)
        at redis.clients.jedis.Connection.getStatusCodeReply(Connection.java:239)
        at redis.clients.jedis.BinaryJedis.quit(BinaryJedis.java:253)
        at redis.clients.jedis.JedisFactory.destroyObject(JedisFactory.java:88)
        at org.apache.commons.pool2.impl.GenericObjectPool.destroy(GenericObjectPool.java:921)
        at org.apache.commons.pool2.impl.GenericObjectPool.invalidateObject(GenericObjectPool.java:626)
        at redis.clients.util.Pool.returnBrokenResourceObject(Pool.java:101)
        at redis.clients.jedis.JedisPool.returnBrokenResource(JedisPool.java:239)
        at redis.clients.jedis.JedisPool.returnBrokenResource(JedisPool.java:16)
        at redis.clients.jedis.Jedis.close(Jedis.java:3407)
        at redis.clients.jedis.JedisClusterInfoCache.renewClusterSlots(JedisClusterInfoCache.java:110)
        at redis.clients.jedis.JedisClusterConnectionHandler.renewSlotCache(JedisClusterConnectionHandler.java:52)
        at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:135)
        at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:141)
        at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:141)
        at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:141)
        at redis.clients.jedis.JedisClusterCommand.run(JedisClusterCommand.java:31)

It’s seem many thread wait a write lock which locked by another thread to be unlock, but i/o operation was pretty slow, client may have network issue with one of redis nodes. The renewClusterSlots operation take 500+ms(connectTimeout 200ms + soTimeout 300ms) in our production. It caused lots of operations timed out.

So I looked into source code of redis.clients.jedis.JedisClusterInfoCache#renewClusterSlots

There are two points I think may improve: 1. reduce lock granularity, move i/o operation out of lock block 2. call renewClusterSlots with explicit exclude redis node which cause IOException (it may have network issue with client or just down)

I would like to open a PR to do this, any thoughts?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
sazzad16commented, Apr 23, 2021

Resolved by #2514

0reactions
Shawyeokcommented, Apr 19, 2021

@sazzad16 Could you please take a look about this #2514?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Improve cluster nodes and slots information #988 - redis/jedis
JedisCluster currently exposes nodes information through the getClusterNodes method but this returns very limited information about node ...
Read more >
Using jedis how to write to a specific slot/node in redis cluster
Solution 1: Found a solution to identify the slot to which keys would go into. JedisCluster has some APIs to get it.
Read more >
Support for pipeline and transactions in JedisCluster for Redis ...
We can use JedisCluster for processing simple operations to Redis Cluster (including ... 4) Get slot by key, get host by slot, get...
Read more >
JedisCluster (Jedis 3.0.1 API) - javadoc.io
public class JedisCluster extends BinaryJedisCluster implements JedisClusterCommands, ... JedisCluster(HostAndPort node, int timeout, int maxAttempts).
Read more >
JedisClusterConnection (Spring Data Redis 3.0.0 API)
Retrieve cluster node information such as id, host, port and slots. Retrieve information about connected replicas for given master node.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found