question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ZooKeeperNetEx connection loss issue: an acquired lock seems is not released

See original GitHub issue

I’ve implemented a lock with the Zookeeper with this configuration :

  1. DistributedLock.ZooKeeper - Version=“1.0.0”
  2. dotnet version 6.0
  3. Hosted on K8s (one pod, there is no concurrent request)
  4. Zookeeper server configuration on K8s :

version: “3.9” services: zk1: container_name: zk1 hostname: zk1 image: bitnami/zookeeper:3.8.0-debian-11-r57 ports: - 2181:2181 environment: - ALLOW_ANONYMOUS_LOGIN=yes - ZOO_SERVER_ID=1 - ZOO_SERVERS=0.0.0.0:2888:3888 - ZOO_MAX_CLIENT_CNXNS=500

There are several worker services inside the application, each of them working with a different lock key. periodically it tries to accuqire the lock and do some processes. It seems they are working without problem, but after a while, I get this exception Locking failed.Exception of type 'org.apache.zookeeper.KeeperException+ConnectionLossException' was thrown. org.apache.zookeeper.KeeperException+ConnectionLossException: Exception of type 'org.apache.zookeeper.KeeperException+ConnectionLossException' was thrown.

It seems the lock cannot be acquired because it has not been released, although there is no concurrent request for the lock key.

The LockService code in dotnet :

    `
     private TimeSpan _connectionTimeoutInSecond = TimeSpan.FromSeconds(30);
     private TimeSpan _waitingForLockInSecond = TimeSpan.FromSeconds(30);
     public async Task<LockProcessResult> DoActionWithLockAsync(string lockKey, Func<Task> func)
       {
      var processResult = new LockProcessResult();
      try
      {
        var @lock = new ZooKeeperDistributedLock(lockKey, _configuration.ConnectionString, opt =>
        {
            opt.ConnectTimeout(_connectionTimeoutInSecond);
        });

        await using (var handle = await @lock.TryAcquireAsync(timeout: _waitingForLockInSecond))
        {
            if (handle != null)
            {
                // I have the lock 
                await func(); 
            }
            else
            {
                processResult.SetException(new LockAcquisitionFailedException(lockKey)); 
            }
        }

     }
     catch (Exception ex)
     {
        //I got the exceptions here
        processResult.SetException(ex); 
     }

     return processResult;
 }`

I appreciate any suggestion

Issue Analytics

  • State:open
  • Created 10 months ago
  • Comments:50 (22 by maintainers)

github_iconTop GitHub Comments

3reactions
devlnullcommented, Jan 2, 2023

Unfortunately we are getting Connection Loss sometimes, but it will be gone in a minute.

Exception of type 'org.apache.zookeeper.KeeperException+ConnectionLossException' was thrown.

1reaction
Jetski5822commented, Jul 5, 2023

@madelson we have just tested your change locally and in a K8 cluster and that code change fixed the issue - could you issue a PR for this change against the main repo?

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to handle Apache Curator Distributed Lock loss of ...
This all seems fine until one of the java process loses the connection with zookeeper after it has acquired a lock. According to...
Read more >
Losing connection to Zookeeper intermittently · Issue #1917
I have a strange issue with Zookeeper membership provider. We have different production deployments on different servers for different ...
Read more >
ZooKeeper cannot create a lock
In some situations, ZooKeeper cannot successfully create a lock because the ZooKeeper ensemble is offline. You can recover from this situation by ...
Read more >
Hive lock left behind
When it happens again you can workaround the issue by deleting the lock inside zookeeper. This will be easier and quicker than restarting...
Read more >
Distributed Lock implementation using Zookeeper in .NET Core
Generally, Locks used to synchronize access to shared resource by ... When the listener releases the locks, the listener can acquire the locks...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found