Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] ServiceBusTrigger - lock renewals stop suddenly/randomly

See original GitHub issue

Library name and version

Microsoft.Azure.WebJobs.Extensions.ServiceBus 5.11.0

Describe the bug

I have a WebJobs project with continuous jobs and Service Bus trigger (topic). The trigger/function runs for quite some time (up to 40 minutes) and after some time, the calls to renew the lock on the message (PeekLock mode) stop being made. This then leads to failures when the function auto-completes (attempts to) the message. MaxAutoLockRenewalDuration is not exceeded as per the project attached.

This is especially hard to deal with because all the work carried out by my function completes but the message goes back to the queue and is re-processed in a retries loop. You can see the output from AppInsights (in Rider) in the screenshot below.

Expected behavior

The message lock should be renewed. Auto-complete should successfully complete the message.

Actual behavior

The message lock is not renewed. Auto-complete fails to complete the message.

Reproduction Steps

You’ll need to set up Service Bus in appsettings.json and create the required messaging entities (check TestTrigger.cs). It takes a while - I’ve seen this 5-6 times in the last 2 days when running this project locally. ServiceBusLockLostRepro.zip

Environment

Hosting: Azure AppService (but the same can be reproduced locally, running against the below

.NET SDK:
 Version:   7.0.203
 Commit:    5b005c19f5

Runtime Environment:
 OS Name:     Windows
 OS Version:  10.0.19044
 OS Platform: Windows
 RID:         win10-x64
 Base Path:   C:\Program Files\dotnet\sdk\7.0.203\

Host:
  Version:      7.0.5
  Architecture: x64
  Commit:       8042d61b17

The issue is IDE-agnostic. I ran it from within Rider as well as using CLI.

Issue Analytics

State:
Created 3 months ago
Comments:20 (10 by maintainers)

Top GitHub Comments

1reaction

jsquirecommented, Jul 6, 2023

Hi @pzaj2. To be clear - if this does root cause to a failure for locks being renewed, then it is absolutely a client bug that we should fix.

However, if it is the result of intermittent network issues causing the connection or link to drop, there’s nothing that your application or the client can do to directly prevent it, unfortunately. It is something that would need to be mitigated by ensuring that the application’s processing is idempotent and can ignoring duplicate data.

Thus far, we haven’t been able to repro and are not seeing stress test failures for this scenario. Logs are going to be our best bet, assuming that you’re able to repro.

The long-term solution is for Service Bus to support AMQP’s durable terminus, which allows for link state to be persistent across connections. Once the service has support, we’ll add it to the client which would mitigate the “I lost my connection and now my lock is invalid” scenario. I do not have insight into the timing for the service feature, however, only that it is on the roadmap.

0reactions

JoshLove-msftcommented, Aug 3, 2023

Closing this out as I haven’t been able to repro and we have made some improvements to make lock lost issues less likely to occur.