question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Losing connection to sql server database is not always causing the lock to be lost

See original GitHub issue

If I run my code (simplified below) with two different instances locally (dotnet run or VS for the first and dotnet run --no-build for the second) and restart my db a couple of times odds are quite high that eventually an instance will just keep going without the lock (in the while-loop). It could be as few restarts as 3-4 restarts. My understanding where that when you access handle.HandleLostToken.IsCancellationRequested it will make sure that the lock still exists? Or is that not working together with CancellationTokenSource.CreateLinkedTokenSource? I just tried without CancellationTokenSource.CreateLinkedTokenSource and get the same result. Not sure what is going on…

using Medallion.Threading;

namespace Sample;

public class TestBackgroundService : BackgroundService
{
    private readonly IServiceScopeFactory _serviceScopeFactory;
    private readonly ILogger<TestBackgroundService> _logger;

    public TestBackgroundService(IServiceScopeFactory serviceScopeFactory, ILogger<TestBackgroundService> logger)
    {
        _serviceScopeFactory = serviceScopeFactory;
        _logger = logger;
    }

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        while (!stoppingToken.IsCancellationRequested)
        {
            try
            {
                await using var scope = _serviceScopeFactory.CreateAsyncScope();

                var distributedLockProvider = scope.ServiceProvider.GetRequiredService<IDistributedLockProvider>();

                _logger.LogInformation("Will try to acquire lock.");

                await using var handle = await distributedLockProvider.AcquireLockAsync(
                    "a1416b2940b34bbb9189caaa13f11b1a",
                    cancellationToken: stoppingToken
                );

                _logger.LogInformation("Acquired lock.");

                handle.HandleLostToken.Register(
                    () => _logger.LogError("Lost lock for job {Job Name}.", nameof(TestBackgroundService))
                );

                var stoppingCts = CancellationTokenSource.CreateLinkedTokenSource(
                    stoppingToken,
                    handle.HandleLostToken
                );

                if (stoppingCts.Token.IsCancellationRequested)
                {
                    return;
                }

                while (!stoppingCts.IsCancellationRequested) // This evaluates to true sometimes even if the database has been restarted
                {
                    _logger.LogInformation("Doing stuff.");

                    try
                    {
                        await Task.Delay(TimeSpan.FromSeconds(30), stoppingCts.Token);
                    }
                    catch (TaskCanceledException)
                    { }
                }
            }
            catch (Exception ex)
            {
                _logger.LogError(ex, "Something went wrong.");
            }
        }
    }
}


Update: I can replicate it with just one instance (even if the log gets a bit tricker to follow then):

using Medallion.Threading;

namespace Samples;

public sealed class Job1 : JobBase
{
    public Job1(IServiceScopeFactory serviceScopeFactory, ILogger<Job1> logger) : base(serviceScopeFactory, logger) { }
}

public sealed class Job2 : JobBase
{
    public Job2(IServiceScopeFactory serviceScopeFactory, ILogger<Job2> logger) : base(serviceScopeFactory, logger) { }
}

public abstract class JobBase : BackgroundService
{
    private readonly IServiceScopeFactory _serviceScopeFactory;
    private readonly ILogger _logger;

    public JobBase(IServiceScopeFactory serviceScopeFactory, ILogger logger)
    {
        _serviceScopeFactory = serviceScopeFactory;
        _logger = logger;
    }

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        while (!stoppingToken.IsCancellationRequested)
        {
            try
            {
                await using var scope = _serviceScopeFactory.CreateAsyncScope();

                var distributedLockProvider = scope.ServiceProvider.GetRequiredService<IDistributedLockProvider>();

                _logger.LogInformation("Will try to acquire lock.");

                await using var handle = await distributedLockProvider.AcquireLockAsync(
                    "a1416b2940b34bbb9189caaa13f11b1a",
                    cancellationToken: stoppingToken
                );

                _logger.LogInformation(
                    "Acquired {CancelableDescription} lock.",
                    handle.HandleLostToken.CanBeCanceled ? "cancelable" : "uncancelable"
                );

                await using var _ = handle.HandleLostToken.Register(() => _logger.LogError("Lost lock."));

                using var stoppingCts = CancellationTokenSource.CreateLinkedTokenSource(
                    stoppingToken,
                    handle.HandleLostToken
                );

                if (stoppingCts.Token.IsCancellationRequested)
                {
                    return;
                }

                while (!stoppingCts.IsCancellationRequested) // This evaluates to true sometimes even if the database has been restarted
                {
                    _logger.LogInformation("Doing stuff.");

                    try
                    {
                        await Task.Delay(TimeSpan.FromSeconds(30), stoppingCts.Token);
                    }
                    catch (TaskCanceledException) { }

                    if (!stoppingToken.IsCancellationRequested)
                    {
                        _logger.LogInformation("Cancellation is not requested.");
                    }
                }
            }
            catch (Exception exception)
            {
                _logger.LogError("Exception {Exception} thrown.", exception.GetType());
            }
        }
    }
}

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:11 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
OskarKlintrotcommented, Jul 6, 2022

I just finished a little experiment. I raised the time before retrying to require a lock to 10 sec so the jobs would take turn picking the lock. I also raised the time before restarting the SQL Server to 60 sec. Then I restarted the SQL Server 100 times and everything still looks perfectly fine:

image

(I used a static field to count the lost locks, hence 100 times)

1reaction
OskarKlintrotcommented, Jul 6, 2022

I installed the prerelease version and restarted the server 10 times (for ($num = 1 ; $num -le 10 ; $num++){ net stop MSSQLSERVER; net start MSSQLSERVER; Start-Sleep -Seconds 10; Write-Host "Restarted $num times" }) and everything seem to be working just as it should now!

As for what I changed:

Oh, that explains it! Nice catch!

I ran into this too.

Thanks for the explanation, which makes sense! What I did instead is basically this:

CancellationTokenRegistration? cancellationTokenRegistration = null;

try
{
    cancellationTokenRegistration = handle.HandleLostToken.Register(() => _logger.LogError("Lost lock."));
}
catch (Exception exception)
{ }
finally
{
    if (cancellationTokenRegistration.HasValue)
    {
        await cancellationTokenRegistration.Value.DisposeAsync();
    }
}
Read more comments on GitHub >

github_iconTop Results From Across the Web

MS SQL Losing Connection to Databases
It looks like the logs report a deadlock. The users lose connection but SQL itself can't see its tables until I restart the...
Read more >
SQL Server Management Studio disconnected after a ...
If you just recreated them, it might be that you accidentally set the AUTO CLOSE of your database. This may cause such a...
Read more >
How do I troubleshoot loss of connection with SQL server
I understand that just about anything in between the application and the database could be causing the connection to be lost. I don't...
Read more >
A network-related or instance-specific error occurred - SQL ...
This error usually means that the client can't find the SQL Server instance. This issue occurs when at least one of the following...
Read more >
Troubleshoot common connection issues to Azure SQL ...
Possible causes include the following: the client tried to connect to an unsupported version of SQL Server; the server was too busy to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found