Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RedissonRedLock issue

See original GitHub issue

你好，在使用RedissonRedLock发现了一个疑似bug的问题。

首先，redlock的算法中提到：

当且仅当从大多数（这里是3个节点）的Redis节点都取到锁，并且使用的时间小于锁失效时间时，锁才算获取成功。

但是RedissonRedLock实际使用中的行为与上面不一致。将RedissonRedLockTest中的测试用例testLockSuccess修改为如下：

    RLock lock1 = client1.getLock("lock1");
    RLock lock2 = client1.getLock("lock2");
    RLock lock3 = client2.getLock("lock3");
    
    Thread t1 = new Thread() {
        public void run() {
            lock1.lock();
        }
    };
    t1.start();
    t1.join();

    RedissonMultiLock lock = new RedissonRedLock(lock1, lock2, lock3);
    assertThat(lock.tryLock(500, 5000, TimeUnit.MILLISECONDS)).isTrue();
    lock.unlock();

    lock1.delete();

上面代码中的assert只在lock3被锁住的时候能通过，lock1或者lock2被锁住时都assert失败。

这个问题在调用public boolean tryLock()方法时则不会出现。

查看了源码以后发现，出现上述现象的原因是，在获取已经被锁住的lock1或者lock2时，waitTime与RedLock的waitTime是一样的（代码在RedissonMultiLock）。所以只要lock1或者lock2获取超时，RedLock就获取超时返回false了。

而redlock的算法中提到：

如果服务器端没有在规定时间内响应，客户端应该尽快尝试另外一个Redis实例。

请帮忙看下这个问题该如何解决。谢谢。

参考资料：Redlock算法

Issue Analytics

State:
Created 6 years ago
Comments:8 (4 by maintainers)

Top GitHub Comments

1reaction

JackEggiecommented, Dec 13, 2017

另外还发现public boolean tryLock(long waitTime, long leaseTime, TimeUnit unit)方法中的194行（代码位置）与redlock算法的原文（下面加粗部分）行为是不一致的。

The client computes how much time elapsed in order to acquire the lock, by subtracting from the current time the timestamp obtained in step 1. If and only if the client was able to acquire the lock in the majority of the instances (at least 3), and the total time elapsed to acquire the lock is less than lock validity time, the lock is considered to be acquired.

If the lock was acquired, its validity time is considered to be the initial validity time minus the time elapsed, as computed in step 3.

客户端使用当前时间减去开始获取锁时间（步骤1记录的时间）就得到获取锁使用的时间。当且仅当从大多数（这里是3个节点）的Redis节点都取到锁，并且使用的时间小于锁失效时间时，锁才算获取成功。

如果取到了锁，key的真正有效时间等于有效时间减去获取锁所使用的时间（步骤3计算的结果）。

1reaction

JackEggiecommented, Dec 11, 2017

照你的说法把testLockSuccess里的代码换成为里面的lock1先加锁，代码运行仍然没问题的。

这里我所指出的bug只在lock1或lock2先加锁，并且在testLockSuccess的184行代码处使用public boolean tryLock(long waitTime, long leaseTime, TimeUnit unit)方法才会出现锁获取不到的情况。直接使用无参数的lock()或者tryLock()方法是不会出现这种现象的。

并且184行处的代码没有assert，无论怎么改，测试用例总是能通过的。