Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Losing connection to sql server database is not always causing the lock to be lost

See original GitHub issue

If I run my code (simplified below) with two different instances locally (dotnet run or VS for the first and dotnet run --no-build for the second) and restart my db a couple of times odds are quite high that eventually an instance will just keep going without the lock (in the while-loop). It could be as few restarts as 3-4 restarts. My understanding where that when you access handle.HandleLostToken.IsCancellationRequested it will make sure that the lock still exists? ~~Or is that not working together with CancellationTokenSource.CreateLinkedTokenSource?~~ I just tried without CancellationTokenSource.CreateLinkedTokenSource and get the same result. Not sure what is going on…

using Medallion.Threading;

namespace Sample;

public class TestBackgroundService : BackgroundService
{
    private readonly IServiceScopeFactory _serviceScopeFactory;
    private readonly ILogger<TestBackgroundService> _logger;

    public TestBackgroundService(IServiceScopeFactory serviceScopeFactory, ILogger<TestBackgroundService> logger)
    {
        _serviceScopeFactory = serviceScopeFactory;
        _logger = logger;
    }

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        while (!stoppingToken.IsCancellationRequested)
        {
            try
            {
                await using var scope = _serviceScopeFactory.CreateAsyncScope();

                var distributedLockProvider = scope.ServiceProvider.GetRequiredService<IDistributedLockProvider>();

                _logger.LogInformation("Will try to acquire lock.");

                await using var handle = await distributedLockProvider.AcquireLockAsync(
                    "a1416b2940b34bbb9189caaa13f11b1a",
                    cancellationToken: stoppingToken
                );

                _logger.LogInformation("Acquired lock.");

                handle.HandleLostToken.Register(
                    () => _logger.LogError("Lost lock for job {Job Name}.", nameof(TestBackgroundService))
                );

                var stoppingCts = CancellationTokenSource.CreateLinkedTokenSource(
                    stoppingToken,
                    handle.HandleLostToken
                );

                if (stoppingCts.Token.IsCancellationRequested)
                {
                    return;
                }

                while (!stoppingCts.IsCancellationRequested) // This evaluates to true sometimes even if the database has been restarted
                {
                    _logger.LogInformation("Doing stuff.");

                    try
                    {
                        await Task.Delay(TimeSpan.FromSeconds(30), stoppingCts.Token);
                    }
                    catch (TaskCanceledException)
                    { }
                }
            }
            catch (Exception ex)
            {
                _logger.LogError(ex, "Something went wrong.");
            }
        }
    }
}

Update: I can replicate it with just one instance (even if the log gets a bit tricker to follow then):

using Medallion.Threading;

namespace Samples;

public sealed class Job1 : JobBase
{
    public Job1(IServiceScopeFactory serviceScopeFactory, ILogger<Job1> logger) : base(serviceScopeFactory, logger) { }
}

public sealed class Job2 : JobBase
{
    public Job2(IServiceScopeFactory serviceScopeFactory, ILogger<Job2> logger) : base(serviceScopeFactory, logger) { }
}

public abstract class JobBase : BackgroundService
{
    private readonly IServiceScopeFactory _serviceScopeFactory;
    private readonly ILogger _logger;

    public JobBase(IServiceScopeFactory serviceScopeFactory, ILogger logger)
    {
        _serviceScopeFactory = serviceScopeFactory;
        _logger = logger;
    }

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        while (!stoppingToken.IsCancellationRequested)
        {
            try
            {
                await using var scope = _serviceScopeFactory.CreateAsyncScope();

                var distributedLockProvider = scope.ServiceProvider.GetRequiredService<IDistributedLockProvider>();

                _logger.LogInformation("Will try to acquire lock.");

                await using var handle = await distributedLockProvider.AcquireLockAsync(
                    "a1416b2940b34bbb9189caaa13f11b1a",
                    cancellationToken: stoppingToken
                );

                _logger.LogInformation(
                    "Acquired {CancelableDescription} lock.",
                    handle.HandleLostToken.CanBeCanceled ? "cancelable" : "uncancelable"
                );

                await using var _ = handle.HandleLostToken.Register(() => _logger.LogError("Lost lock."));

                using var stoppingCts = CancellationTokenSource.CreateLinkedTokenSource(
                    stoppingToken,
                    handle.HandleLostToken
                );

                if (stoppingCts.Token.IsCancellationRequested)
                {
                    return;
                }

                while (!stoppingCts.IsCancellationRequested) // This evaluates to true sometimes even if the database has been restarted
                {
                    _logger.LogInformation("Doing stuff.");

                    try
                    {
                        await Task.Delay(TimeSpan.FromSeconds(30), stoppingCts.Token);
                    }
                    catch (TaskCanceledException) { }

                    if (!stoppingToken.IsCancellationRequested)
                    {
                        _logger.LogInformation("Cancellation is not requested.");
                    }
                }
            }
            catch (Exception exception)
            {
                _logger.LogError("Exception {Exception} thrown.", exception.GetType());
            }
        }
    }
}

Issue Analytics

State:
Created a year ago
Comments:11 (4 by maintainers)

Top GitHub Comments

1reaction

OskarKlintrotcommented, Jul 6, 2022

I just finished a little experiment. I raised the time before retrying to require a lock to 10 sec so the jobs would take turn picking the lock. I also raised the time before restarting the SQL Server to 60 sec. Then I restarted the SQL Server 100 times and everything still looks perfectly fine:

(I used a static field to count the lost locks, hence 100 times)

1reaction

OskarKlintrotcommented, Jul 6, 2022

I installed the prerelease version and restarted the server 10 times (for ($num = 1 ; $num -le 10 ; $num++){ net stop MSSQLSERVER; net start MSSQLSERVER; Start-Sleep -Seconds 10; Write-Host "Restarted $num times" }) and everything seem to be working just as it should now!

As for what I changed:

Oh, that explains it! Nice catch!

I ran into this too.

Thanks for the explanation, which makes sense! What I did instead is basically this:

CancellationTokenRegistration? cancellationTokenRegistration = null;

try
{
    cancellationTokenRegistration = handle.HandleLostToken.Register(() => _logger.LogError("Lost lock."));
}
catch (Exception exception)
{ }
finally
{
    if (cancellationTokenRegistration.HasValue)
    {
        await cancellationTokenRegistration.Value.DisposeAsync();
    }
}