Losing connection to sql server database is not always causing the lock to be lost
See original GitHub issueIf I run my code (simplified below) with two different instances locally (dotnet run
or VS for the first and dotnet run --no-build
for the second) and restart my db a couple of times odds are quite high that eventually an instance will just keep going without the lock (in the while-loop). It could be as few restarts as 3-4 restarts. My understanding where that when you access handle.HandleLostToken.IsCancellationRequested
it will make sure that the lock still exists? Or is that not working together with I just tried without CancellationTokenSource.CreateLinkedTokenSource
?CancellationTokenSource.CreateLinkedTokenSource
and get the same result. Not sure what is going on…
using Medallion.Threading;
namespace Sample;
public class TestBackgroundService : BackgroundService
{
private readonly IServiceScopeFactory _serviceScopeFactory;
private readonly ILogger<TestBackgroundService> _logger;
public TestBackgroundService(IServiceScopeFactory serviceScopeFactory, ILogger<TestBackgroundService> logger)
{
_serviceScopeFactory = serviceScopeFactory;
_logger = logger;
}
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
while (!stoppingToken.IsCancellationRequested)
{
try
{
await using var scope = _serviceScopeFactory.CreateAsyncScope();
var distributedLockProvider = scope.ServiceProvider.GetRequiredService<IDistributedLockProvider>();
_logger.LogInformation("Will try to acquire lock.");
await using var handle = await distributedLockProvider.AcquireLockAsync(
"a1416b2940b34bbb9189caaa13f11b1a",
cancellationToken: stoppingToken
);
_logger.LogInformation("Acquired lock.");
handle.HandleLostToken.Register(
() => _logger.LogError("Lost lock for job {Job Name}.", nameof(TestBackgroundService))
);
var stoppingCts = CancellationTokenSource.CreateLinkedTokenSource(
stoppingToken,
handle.HandleLostToken
);
if (stoppingCts.Token.IsCancellationRequested)
{
return;
}
while (!stoppingCts.IsCancellationRequested) // This evaluates to true sometimes even if the database has been restarted
{
_logger.LogInformation("Doing stuff.");
try
{
await Task.Delay(TimeSpan.FromSeconds(30), stoppingCts.Token);
}
catch (TaskCanceledException)
{ }
}
}
catch (Exception ex)
{
_logger.LogError(ex, "Something went wrong.");
}
}
}
}
Update: I can replicate it with just one instance (even if the log gets a bit tricker to follow then):
using Medallion.Threading;
namespace Samples;
public sealed class Job1 : JobBase
{
public Job1(IServiceScopeFactory serviceScopeFactory, ILogger<Job1> logger) : base(serviceScopeFactory, logger) { }
}
public sealed class Job2 : JobBase
{
public Job2(IServiceScopeFactory serviceScopeFactory, ILogger<Job2> logger) : base(serviceScopeFactory, logger) { }
}
public abstract class JobBase : BackgroundService
{
private readonly IServiceScopeFactory _serviceScopeFactory;
private readonly ILogger _logger;
public JobBase(IServiceScopeFactory serviceScopeFactory, ILogger logger)
{
_serviceScopeFactory = serviceScopeFactory;
_logger = logger;
}
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
while (!stoppingToken.IsCancellationRequested)
{
try
{
await using var scope = _serviceScopeFactory.CreateAsyncScope();
var distributedLockProvider = scope.ServiceProvider.GetRequiredService<IDistributedLockProvider>();
_logger.LogInformation("Will try to acquire lock.");
await using var handle = await distributedLockProvider.AcquireLockAsync(
"a1416b2940b34bbb9189caaa13f11b1a",
cancellationToken: stoppingToken
);
_logger.LogInformation(
"Acquired {CancelableDescription} lock.",
handle.HandleLostToken.CanBeCanceled ? "cancelable" : "uncancelable"
);
await using var _ = handle.HandleLostToken.Register(() => _logger.LogError("Lost lock."));
using var stoppingCts = CancellationTokenSource.CreateLinkedTokenSource(
stoppingToken,
handle.HandleLostToken
);
if (stoppingCts.Token.IsCancellationRequested)
{
return;
}
while (!stoppingCts.IsCancellationRequested) // This evaluates to true sometimes even if the database has been restarted
{
_logger.LogInformation("Doing stuff.");
try
{
await Task.Delay(TimeSpan.FromSeconds(30), stoppingCts.Token);
}
catch (TaskCanceledException) { }
if (!stoppingToken.IsCancellationRequested)
{
_logger.LogInformation("Cancellation is not requested.");
}
}
}
catch (Exception exception)
{
_logger.LogError("Exception {Exception} thrown.", exception.GetType());
}
}
}
}
Issue Analytics
- State:
- Created a year ago
- Comments:11 (4 by maintainers)
Top GitHub Comments
I just finished a little experiment. I raised the time before retrying to require a lock to 10 sec so the jobs would take turn picking the lock. I also raised the time before restarting the SQL Server to 60 sec. Then I restarted the SQL Server 100 times and everything still looks perfectly fine:
(I used a static field to count the lost locks, hence 100 times)
I installed the prerelease version and restarted the server 10 times (
for ($num = 1 ; $num -le 10 ; $num++){ net stop MSSQLSERVER; net start MSSQLSERVER; Start-Sleep -Seconds 10; Write-Host "Restarted $num times" }
) and everything seem to be working just as it should now!Oh, that explains it! Nice catch!
Thanks for the explanation, which makes sense! What I did instead is basically this: