Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Infinite ServiceBus retry loop in Microsoft.Azure.WebJobs.Extensions.ServiceBus 5.0.0-beta.5

See original GitHub issue

Describe the bug When using Microsoft.Azure.WebJobs.Extensions.ServiceBus 5.0.0-beta.5 messages will keep retrying indefinitely .

Expected behavior Messages going to the deadletter queue.

Actual behavior (include Exception or Stack Trace) Let’s see the host config when the Azure Function boots, there is a maxRetryCount of 2:

[2021-08-04T16:46:12.832Z] Host configuration file read:
[2021-08-04T16:46:12.833Z] {
[2021-08-04T16:46:12.834Z]   "version": "2.0",
[2021-08-04T16:46:12.840Z]   "retry": {
[2021-08-04T16:46:12.841Z]     "delayInterval": "00:00:03",
[2021-08-04T16:46:12.842Z]     "maxRetryCount": 2,
[2021-08-04T16:46:12.862Z]     "strategy": "fixedDelay"
[2021-08-04T16:46:12.864Z]   },
[2021-08-04T16:46:12.865Z]   "logging": {
[2021-08-04T16:46:12.866Z]     "logLevel": {
[2021-08-04T16:46:12.868Z]       "default": "Information"
[2021-08-04T16:46:12.873Z]     },
[2021-08-04T16:46:12.900Z]     "applicationInsights": {
[2021-08-04T16:46:12.902Z]       "samplingSettings": {
[2021-08-04T16:46:12.903Z]         "isEnabled": true,
[2021-08-04T16:46:12.905Z]         "excludedTypes": "Dependency;Event;Request"
[2021-08-04T16:46:12.906Z]       }
[2021-08-04T16:46:12.907Z]     }
[2021-08-04T16:46:12.923Z]   }
[2021-08-04T16:46:12.925Z] }

…and the servicebus retries, that’s 2 as well:

[2021-08-04T16:46:14.307Z] ServiceBusOptions
[2021-08-04T16:46:14.310Z] {
[2021-08-04T16:46:14.313Z]   "ClientRetryOptions": {
[2021-08-04T16:46:14.315Z]     "Mode": "Exponential",
[2021-08-04T16:46:14.317Z]     "TryTimeout": "00:01:00",
[2021-08-04T16:46:14.318Z]     "Delay": "00:00:00.8000000",
[2021-08-04T16:46:14.320Z]     "MaxDelay": "00:02:00",
[2021-08-04T16:46:14.324Z]     "MaxRetries": 2
[2021-08-04T16:46:14.326Z]   },
[2021-08-04T16:46:14.327Z]   "TransportType": "AmqpTcp",
[2021-08-04T16:46:14.329Z]   "WebProxy": "",
[2021-08-04T16:46:14.331Z]   "AutoCompleteMessages": true,
[2021-08-04T16:46:14.332Z]   "PrefetchCount": 0,
[2021-08-04T16:46:14.336Z]   "MaxAutoLockRenewalDuration": "00:05:00",
[2021-08-04T16:46:14.338Z]   "MaxConcurrentCalls": 50,
[2021-08-04T16:46:14.339Z]   "MaxConcurrentSessions": 8,
[2021-08-04T16:46:14.340Z]   "MaxMessageBatchSize": 1000,
[2021-08-04T16:46:14.341Z]   "SessionIdleTimeout": "00:01:00"
[2021-08-04T16:46:14.344Z] }

My binding looks like this, from queue-A to queue-B, I’ve put a function scoped retry count to 2:

 [FunctionName("LoopyLoop")]
        [ExponentialBackoffRetry(2, "00:00:04", "00:15:00")]
        public static async Task Run(
            [ServiceBusTrigger("queue-A", Connection = "some-user-assigned-id")] ServiceBusReceivedMessage myQueueItem, ServiceBusMessageActions messageActions, 
            [ServiceBus("queue-B", Connection = "some-user-assigned-id")] ServiceBusSender sender, ILogger log)
        {

If I understand correctly, for retries: the host setting will be overridden by the function setting, which is 2, and the servicebus setting is 2, which makes a total of (2*2) 4 attempts at most. I created a settlement, this always gets hit (with a hard coded throw exception) :

  var count = myQueueItem.DeliveryCount;
            if(count > 30)
            {
                await messageActions.DeadLetterMessageAsync(myQueueItem, "Things that grind your gears", "Infinite Retries");
                return;
            }

As you see, it gets hit at attempt 31. This is the only way my Function can stop the retry loop. Without this, it will keep going.

To Reproduce

Create a SB Premium instance
Create a SB triggered function that will send to another queue, and keep retrying by hard coding an exception…

Environment:

Windows locally or inside the Azure host, same behavior in both.

Libraries:

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <TargetFramework>netcoreapp3.1</TargetFramework>
    <AzureFunctionsVersion>v3</AzureFunctionsVersion>
    <_FunctionsSkipCleanOutput>true</_FunctionsSkipCleanOutput>
  </PropertyGroup>
  <ItemGroup>
    <FrameworkReference Include="Microsoft.AspNetCore.App" />

    <PackageReference Include="Azure.Storage.Blobs" Version="12.9.1" />
    <PackageReference Include="Microsoft.Azure.Cosmos.Table" Version="1.0.8" />
    <PackageReference Include="Microsoft.Azure.WebJobs" Version="3.0.27" />
    <PackageReference Include="Microsoft.Azure.WebJobs.Extensions.Http" Version="3.0.12" />
    <PackageReference Include="Microsoft.Azure.WebJobs.Extensions.ServiceBus" Version="5.0.0-beta.5" />
    <PackageReference Include="Microsoft.Graph" Version="3.20.0" />
    <PackageReference Include="Microsoft.Graph.Auth" Version="1.0.0-preview.5" />
    <PackageReference Include="Microsoft.Identity.Client" Version="4.35.1" />
    <PackageReference Include="Microsoft.IdentityModel.Protocols.OpenIdConnect" Version="6.12.0" />
    <PackageReference Include="Microsoft.NET.Sdk.Functions" Version="3.0.13" />
    <PackageReference Include="System.Diagnostics.DiagnosticSource" Version="4.7.0" />
  </ItemGroup>
  <ItemGroup>

Issue Analytics

State:
Created 2 years ago
Comments:19 (10 by maintainers)

Top GitHub Comments

2reactions

JoshLove-msftcommented, Aug 6, 2021

I think there are really only two categories of retry - Function level, and Service Bus SDK level. You can kind of think of the incremented delivery count as the service retrying, but let’s treat that separately.

The function level retry is configured by adding the retry attribute or in the function settings as you mention. The function retry kicks in when your function has an unhandled exception.

The Service Bus SDK retries are configured as you show in the table. These retries occur when any transient error occurs while the trigger is attempting to receive messages and deliver them to your function, or when you interact with any of the SDK types from within your function, e.g. ServiceBusMessageActions/ServiceBusSender/etc.

Now, imagine that you have retries configured at both levels - once the SDK retries are exhausted, the function retries would kick in so you end up multiplying the max retries from both levels. On the other hand, if an error occurs from the SDK that is NOT retriable, such as the message lock being lost when attempting to complete the message, the function retry logic will still kick in even though the SDK knew not to retry. This is what we are planning on fixing in the future, and why I said using both together does not work great right now.

I think of function level retries as being most useful when you have logic that is outside of the Service Bus SDK actions that may need to be retried. If you are only concerned about retrying SDK operations, it is not really needed.

Now let’s get to the delivery count. The delivery count of a message is incremented by the service every time it delivers the message to a receiver. So when the trigger receives a message and delivers it to your function, the delivery count is 1. If your function doesn’t complete it, the message remains in the queue. The next time the trigger receives the message, the delivery count would be 2. The function and Service Bus SDK retries will not generally cause the delivery count to be incremented - if while receiving a message, there is a transient error, the receive will be retried but the message should still have a delivery count of 1 when it is received as it was only actually delivered once. The reason the delivery count gets incremented is that the trigger just continually receives messages in a loop, so it will eventually get the same message again if you don’t complete it in your function.

0reactions

erwinkramercommented, Aug 5, 2021

@JoshLove-msft thanks for all your comments. I’ve checked by DeliveryCount on the queue itself and it was way too high, I’ve set it to 10 and it works as expected. Not an issue caused by the beta package after all.

Since we talked about so many different types of retries, I went ahead and made a table to breakdown the different configurations and their effect on Service Bus behavior. Does it make sense?

Retry configuration	Function retry	Function ServiceBus SDK retry	ServiceBus retry
Where to configure	Function Retry policies as decoration for a single Function, or at the retry element in Function settings	extensions:serviceBus:clientRetryOptions element in Function settings	DeliveryCount on a queue
In effect when	Transient errors caused by the function runtime	Transient errors caused by the servicebus ‘client’ SDK when sending a message or when receiving from a ServiceBus trigger	Unhandled exceptions inside the Run() of a function with a ServiceBus trigger
Compatible with ServiceBus triggers	Not at the moment	Yes	Yes
Can increment the SB DeliveryCount on retry	Maybe?	No	Yes

Top Results From Across the Web

Service Bus webjobs extension should cancel retry loop if ...

We should make sure that the function retry logic does not happen when the message lock is lost before the functions retry feature...

Trying to use retryOptions for service bus trigger on Azure ...

I am just trying to follow this article in order to customize retrying behavior on the service bus trigger (which is I believe...

Azure Functions error handling and retry guidance

Learn how to handle errors and retry events in Azure Functions, with links to specific binding errors, including information on retry ...

Azure Service Bus Retry Options Not Working (v5.2.0)

When reading or publishing a message fails due to an error that is deemed transient, these settings are applied to client retries.

Building resilient azure functions with retry policies

When processing of a message fails with a transient (retriable) error, function code should bubble up exception up the function runtime, which ...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

[BUG] Infinite ServiceBus retry loop in Microsoft.Azure.WebJobs.Extensions.ServiceBus 5.0.0-beta.5

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

[Form Recognizer] The given key 'Contact' was not present in the dictionary.

Check if StorageInputType needs to be in TranslationSource or in DocumentTranslationInput