[BUG] ServiceBusTrigger - lock renewals stop suddenly/randomly
See original GitHub issueLibrary name and version
Microsoft.Azure.WebJobs.Extensions.ServiceBus 5.11.0
Describe the bug
I have a WebJobs project with continuous jobs and Service Bus trigger (topic). The trigger/function runs for quite some time (up to 40 minutes) and after some time, the calls to renew the lock on the message (PeekLock
mode) stop being made. This then leads to failures when the function auto-completes (attempts to) the message. MaxAutoLockRenewalDuration
is not exceeded as per the project attached.
This is especially hard to deal with because all the work carried out by my function completes but the message goes back to the queue and is re-processed in a retries loop. You can see the output from AppInsights (in Rider) in the screenshot below.
Expected behavior
The message lock should be renewed. Auto-complete should successfully complete the message.
Actual behavior
The message lock is not renewed. Auto-complete fails to complete the message.
Reproduction Steps
You’ll need to set up Service Bus in appsettings.json
and create the required messaging entities (check TestTrigger.cs
). It takes a while - I’ve seen this 5-6 times in the last 2 days when running this project locally.
ServiceBusLockLostRepro.zip
Environment
Hosting: Azure AppService (but the same can be reproduced locally, running against the below
.NET SDK:
Version: 7.0.203
Commit: 5b005c19f5
Runtime Environment:
OS Name: Windows
OS Version: 10.0.19044
OS Platform: Windows
RID: win10-x64
Base Path: C:\Program Files\dotnet\sdk\7.0.203\
Host:
Version: 7.0.5
Architecture: x64
Commit: 8042d61b17
The issue is IDE-agnostic. I ran it from within Rider as well as using CLI.
Issue Analytics
- State:
- Created 3 months ago
- Comments:20 (10 by maintainers)
Top GitHub Comments
Hi @pzaj2. To be clear - if this does root cause to a failure for locks being renewed, then it is absolutely a client bug that we should fix.
However, if it is the result of intermittent network issues causing the connection or link to drop, there’s nothing that your application or the client can do to directly prevent it, unfortunately. It is something that would need to be mitigated by ensuring that the application’s processing is idempotent and can ignoring duplicate data.
Thus far, we haven’t been able to repro and are not seeing stress test failures for this scenario. Logs are going to be our best bet, assuming that you’re able to repro.
The long-term solution is for Service Bus to support AMQP’s durable terminus, which allows for link state to be persistent across connections. Once the service has support, we’ll add it to the client which would mitigate the “I lost my connection and now my lock is invalid” scenario. I do not have insight into the timing for the service feature, however, only that it is on the roadmap.
Closing this out as I haven’t been able to repro and we have made some improvements to make lock lost issues less likely to occur.