Postgres lock is not released in specific multi-threaded scenarios
See original GitHub issueHello, I started to use this library in a project that I am working on and encountered a strange issue. The issue seems to be somewhat similar to another open issue - #115, but they might be different.
So a little bit about my app - it runs on .Net 6 and uses EF Core as ORM in order to invoke certain actions against Postgres DB. It receives some messages concurrently in different threads, and for each message, the app tries to create a connection to the DB, begin a transaction and then try acquiring a Postgres advisory lock via this library’s API using that DB connection. If the lock is acquired, then some business logic is being done, and then the lock is released. If the lock was not acquired, then the thread re-tries a couple of times every few milliseconds, until it throws an exception if the lock cannot be held. Everything that was described here is being done in a complete asynchronous manner.
The code regarding the lock looks like this (I simplified it):
var dataContext = new DataContext();
var transaction = await dataContext.Database.BeginTransactionAsync(token);
var dbConnection = dataContext.Database.GetDbConnection();
var lockKey = GetLockKey(); // This can be any long
var postgresAdvisoryLockKey = new PostgresAdvisoryLockKey(lockKey);
var postgresDistributedLock = new PostgresDistributedLock(postgresAdvisoryLockKey, dbConnection);
await using (var distributedLockHandle = await postgresDistributedLock.TryAcquireAsync(timeout, token))
{
// Some business logic...
// Transaction is committed here
}
// Transaction and Data Context are disposed here
It actually works, but only under specific multi-threaded scenarios. I ran some simple and short load tests on my app by sending it messages. The app is configured with different number of threads which are getting messages and then trying to acquire the same lock simultaneously. Apparently, when there are only 4 threads (or less) which try to acquire the lock, then it is being held and then released correctly. However, if the app runs with more than 4 threads (lets say 8), then after a few dozens of seconds, one of the thread supposedly releases the lock, and then no other thread can acquire it anymore, as if the lock was never actually released. It happened on every run that I did. I also tried to use the sync dispose function of the lock handle, but it did not change anything.
Now, when I look into the pg_locks table in Postgres using this query:
SELECT * FROM pg_locks WHERE locktype = 'advisory'
I can see the following while the threads hang on the lock (8 threads), it stays the same until I kill the app:
It does seem like the lock was not released all of a sudden, and I do not understand why. I added logs around everything, and it seems like the app is working correctly. Then I found the issue that I mentioned at the start, and started to wonder whether there is a bug in the library.
Am I using the library in a wrong way? Could it be that either the Dispose/AsyncDispose or TryAcquire functions are throwing exceptions and swallow them? Returning null values/Failing to release the lock? Is there any way to check it? I will be glad to hear your thoughts and answer any questions.
Thanks.
Issue Analytics
- State:
- Created 10 months ago
- Comments:13 (7 by maintainers)
Top GitHub Comments
Got confirmation that this is Postgres behavior: https://www.postgresql.org/message-id/17686-fb1fa3870138e394%40postgresql.org
Working on a simple fix which is to just re-check whether the lock is acquired after a timeout.
Glad that this workaround seems effective. Yes I was referring to looping with retries with sleeps vs. a single wait with a longer timeout. I think in general you’d want the single wait since then you get better fairness (threads stay in line vs repeatedly giving up) and less resource usage due to fewer DB round trips.