question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Service Bus .NET SDK randomly fails to complete messages for long running process

See original GitHub issue

Describe the bug When a Service Bus topic message requires a long time to be processed (in my case between 15 and 20 minutes) and the respective MessageHandlerOptions was initialized as:

new MessageHandlerOptions(HandleException)
{
    MaxAutoRenewDuration = TimeSpan.FromMinutes(30),
    AutoComplete = false
}

when the processing has finished (always sooner than the TimeSpan set for MaxAutoRenewDuration, in my case processing took 18 mins and the MaxAutoRenewDuration was set to 30 mins) and attempting to complete the message an error occurs, as if the message lock has expired or the message has been already removed. Only a single process is working with the Service Bus topic at the given time.

Expected behavior My understanding is that based on the value of MaxAutoRenewDuration, the expected behavior would be to keep the message locked and automatically renew the lock for the duration value defined. When the processing is finished and before this duration expires, the message lock should be automatically renewed and valid to complete the message.

Actual behavior (include Exception or Stack Trace) When trying to complete the message after the long running process the exception was the following:

Microsoft.Azure.ServiceBus.MessageLockLostException: The lock supplied is invalid. Either the lock expired, or the message has already been removed from the queue, or was received by a different receiver instance.
   at Microsoft.Azure.ServiceBus.Core.MessageReceiver.DisposeMessagesAsync(IEnumerable`1 lockTokens, Outcome outcome)
   at Microsoft.Azure.ServiceBus.RetryPolicy.RunOperation(Func`1 operation, TimeSpan operationTimeout)
   at Microsoft.Azure.ServiceBus.RetryPolicy.RunOperation(Func`1 operation, TimeSpan operationTimeout)
   at Microsoft.Azure.ServiceBus.Core.MessageReceiver.CompleteAsync(IEnumerable`1 lockTokens)
   at ServiceBusHandler.Program.HandleLongRunningMessage(Message message, CancellationToken cancellationToken) in C:\[PROGRAM_LOCATION]\Program.cs:line 46
   at Microsoft.Azure.ServiceBus.MessageReceivePump.MessageDispatchTask(Message message)

and sometimes in the middle of the processing I got the following exception (slightly different, when trying to renew the lock I would guess):

Microsoft.Azure.ServiceBus.MessageLockLostException: The lock supplied is invalid. Either the lock expired, or the message has already been removed from the queue. Reference:38e353cf-423a-4421-bdc0-207ca76c7244, TrackingId:1474382e-1a29-4800-b599-c176d1559804_B1, SystemTracker:gsis-einvoice:Topic:longtopic|longsub, Timestamp:2020-04-23T06:26:07
   at Microsoft.Azure.ServiceBus.Core.MessageReceiver.OnRenewLockAsync(String lockToken)
   at Microsoft.Azure.ServiceBus.Core.MessageReceiver.<>c__DisplayClass74_0.<<RenewLockAsync>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.Azure.ServiceBus.RetryPolicy.RunOperation(Func`1 operation, TimeSpan operationTimeout)
   at Microsoft.Azure.ServiceBus.RetryPolicy.RunOperation(Func`1 operation, TimeSpan operationTimeout)
   at Microsoft.Azure.ServiceBus.Core.MessageReceiver.RenewLockAsync(String lockToken)
   at Microsoft.Azure.ServiceBus.Core.MessageReceiver.RenewLockAsync(Message message)
   at Microsoft.Azure.ServiceBus.MessageReceivePump.RenewMessageLockTask(Message message, CancellationToken renewLockCancellationToken)

To Reproduce I am including a console program code in NET Core 3.1 in the following gist using which I was able to reproduce the problem a couple of times when I ran it for 10 messages:

https://gist.github.com/nianton/a1b64094d79c0da4f15037ce301d1b23

Environment:

  • Azure Service Bus resource
    • Location: West Europe
    • Tier: Standard
    • Message time to live: 14 days
    • Message lock duration: 5 mins
  • Microsoft.Azure.ServiceBus v4.1.3
  • Windows 10, .NET Core v3.1.3
  • Visual Studio 16.5.2

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:5
  • Comments:14 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
axisccommented, Oct 27, 2020

@nianton Locks in Service Bus is volatile. The longer the processing of the message, the more susceptible it is to events that may cause the lock to be lost.

Some of these events are - Client restarts, service restarts, connection breaks, etc. In this case the message is locked to the specific connection and when you reconnect and renew or complete the message the lock is no longer held by the client. The SDK’s retry logic will ensure you can reconnect, but realistically, you may have the random lock lost exceptions.

hope this helps.

0reactions
ramya-rao-acommented, Aug 25, 2021

Hey @sikemullivan

The reply from the Service Bus team in https://github.com/Azure/azure-sdk-for-net/issues/11533#issuecomment-716899419 explains the potential reasons for losing locks. This is by design and there are no fixes that can be made to the SDK to work around this. Therefore, we closed the issue.

If I have a process that takes 2hrs to run, the client should be able to regain itself if the connection breaks.

If the connection breaks in between these 2 hours, there is no way supported by the service today to regain the locks.

This would be feature request for the service which can be logged at https://github.com/Azure/azure-service-bus

Also, a similar discussion is happening over at #8291. Please consider subscribing there and adding your inputs

Read more comments on GitHub >

github_iconTop Results From Across the Web

Azure Service Bus and long processing messages
Net SDK, the method you would want to call is RenewMessageLock and pass in the lock id you received when you fetched the...
Read more >
Unable to receive some messages from Service Bus Queue
My scenario is, I have an Azure Function which is using Event Hub trigger. After processing data from Event Hub, I am saving...
Read more >
Long-running operations with Azure Service Bus Transport
Run (() => StartProcessing(cancellationToken)); During processing, an exception is emulated randomly to demonstrate a failing scenario.
Read more >
Building resilient azure functions with retry policies
When processing of a message fails with a transient (retriable) error, function code should bubble up exception up the function runtime, which ...
Read more >
Azure Service Bus client library for .NET
Azure Service Bus allows you to build applications that take advantage of asynchronous messaging patterns using a highly-reliable service to broker messages ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found