question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Infinite ServiceBus retry loop in Microsoft.Azure.WebJobs.Extensions.ServiceBus 5.0.0-beta.5

See original GitHub issue

Describe the bug When using Microsoft.Azure.WebJobs.Extensions.ServiceBus 5.0.0-beta.5 messages will keep retrying indefinitely .

Expected behavior Messages going to the deadletter queue.

Actual behavior (include Exception or Stack Trace) Let’s see the host config when the Azure Function boots, there is a maxRetryCount of 2:

[2021-08-04T16:46:12.832Z] Host configuration file read:
[2021-08-04T16:46:12.833Z] {
[2021-08-04T16:46:12.834Z]   "version": "2.0",
[2021-08-04T16:46:12.840Z]   "retry": {
[2021-08-04T16:46:12.841Z]     "delayInterval": "00:00:03",
[2021-08-04T16:46:12.842Z]     "maxRetryCount": 2,
[2021-08-04T16:46:12.862Z]     "strategy": "fixedDelay"
[2021-08-04T16:46:12.864Z]   },
[2021-08-04T16:46:12.865Z]   "logging": {
[2021-08-04T16:46:12.866Z]     "logLevel": {
[2021-08-04T16:46:12.868Z]       "default": "Information"
[2021-08-04T16:46:12.873Z]     },
[2021-08-04T16:46:12.900Z]     "applicationInsights": {
[2021-08-04T16:46:12.902Z]       "samplingSettings": {
[2021-08-04T16:46:12.903Z]         "isEnabled": true,
[2021-08-04T16:46:12.905Z]         "excludedTypes": "Dependency;Event;Request"
[2021-08-04T16:46:12.906Z]       }
[2021-08-04T16:46:12.907Z]     }
[2021-08-04T16:46:12.923Z]   }
[2021-08-04T16:46:12.925Z] }

…and the servicebus retries, that’s 2 as well:

[2021-08-04T16:46:14.307Z] ServiceBusOptions
[2021-08-04T16:46:14.310Z] {
[2021-08-04T16:46:14.313Z]   "ClientRetryOptions": {
[2021-08-04T16:46:14.315Z]     "Mode": "Exponential",
[2021-08-04T16:46:14.317Z]     "TryTimeout": "00:01:00",
[2021-08-04T16:46:14.318Z]     "Delay": "00:00:00.8000000",
[2021-08-04T16:46:14.320Z]     "MaxDelay": "00:02:00",
[2021-08-04T16:46:14.324Z]     "MaxRetries": 2
[2021-08-04T16:46:14.326Z]   },
[2021-08-04T16:46:14.327Z]   "TransportType": "AmqpTcp",
[2021-08-04T16:46:14.329Z]   "WebProxy": "",
[2021-08-04T16:46:14.331Z]   "AutoCompleteMessages": true,
[2021-08-04T16:46:14.332Z]   "PrefetchCount": 0,
[2021-08-04T16:46:14.336Z]   "MaxAutoLockRenewalDuration": "00:05:00",
[2021-08-04T16:46:14.338Z]   "MaxConcurrentCalls": 50,
[2021-08-04T16:46:14.339Z]   "MaxConcurrentSessions": 8,
[2021-08-04T16:46:14.340Z]   "MaxMessageBatchSize": 1000,
[2021-08-04T16:46:14.341Z]   "SessionIdleTimeout": "00:01:00"
[2021-08-04T16:46:14.344Z] }

My binding looks like this, from queue-A to queue-B, I’ve put a function scoped retry count to 2:

 [FunctionName("LoopyLoop")]
        [ExponentialBackoffRetry(2, "00:00:04", "00:15:00")]
        public static async Task Run(
            [ServiceBusTrigger("queue-A", Connection = "some-user-assigned-id")] ServiceBusReceivedMessage myQueueItem, ServiceBusMessageActions messageActions, 
            [ServiceBus("queue-B", Connection = "some-user-assigned-id")] ServiceBusSender sender, ILogger log)
        {

If I understand correctly, for retries: the host setting will be overridden by the function setting, which is 2, and the servicebus setting is 2, which makes a total of (2*2) 4 attempts at most. I created a settlement, this always gets hit (with a hard coded throw exception) :

  var count = myQueueItem.DeliveryCount;
            if(count > 30)
            {
                await messageActions.DeadLetterMessageAsync(myQueueItem, "Things that grind your gears", "Infinite Retries");
                return;
            }

As you see, it gets hit at attempt 31. This is the only way my Function can stop the retry loop. Without this, it will keep going.

To Reproduce

  1. Create a SB Premium instance
  2. Create a SB triggered function that will send to another queue, and keep retrying by hard coding an exception…

Environment:

  • Windows locally or inside the Azure host, same behavior in both.

Libraries:

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <TargetFramework>netcoreapp3.1</TargetFramework>
    <AzureFunctionsVersion>v3</AzureFunctionsVersion>
    <_FunctionsSkipCleanOutput>true</_FunctionsSkipCleanOutput>
  </PropertyGroup>
  <ItemGroup>
    <FrameworkReference Include="Microsoft.AspNetCore.App" />

    <PackageReference Include="Azure.Storage.Blobs" Version="12.9.1" />
    <PackageReference Include="Microsoft.Azure.Cosmos.Table" Version="1.0.8" />
    <PackageReference Include="Microsoft.Azure.WebJobs" Version="3.0.27" />
    <PackageReference Include="Microsoft.Azure.WebJobs.Extensions.Http" Version="3.0.12" />
    <PackageReference Include="Microsoft.Azure.WebJobs.Extensions.ServiceBus" Version="5.0.0-beta.5" />
    <PackageReference Include="Microsoft.Graph" Version="3.20.0" />
    <PackageReference Include="Microsoft.Graph.Auth" Version="1.0.0-preview.5" />
    <PackageReference Include="Microsoft.Identity.Client" Version="4.35.1" />
    <PackageReference Include="Microsoft.IdentityModel.Protocols.OpenIdConnect" Version="6.12.0" />
    <PackageReference Include="Microsoft.NET.Sdk.Functions" Version="3.0.13" />
    <PackageReference Include="System.Diagnostics.DiagnosticSource" Version="4.7.0" />
  </ItemGroup>
  <ItemGroup>

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:19 (10 by maintainers)

github_iconTop GitHub Comments

2reactions
JoshLove-msftcommented, Aug 6, 2021

I think there are really only two categories of retry - Function level, and Service Bus SDK level. You can kind of think of the incremented delivery count as the service retrying, but let’s treat that separately.

The function level retry is configured by adding the retry attribute or in the function settings as you mention. The function retry kicks in when your function has an unhandled exception.

The Service Bus SDK retries are configured as you show in the table. These retries occur when any transient error occurs while the trigger is attempting to receive messages and deliver them to your function, or when you interact with any of the SDK types from within your function, e.g. ServiceBusMessageActions/ServiceBusSender/etc.

Now, imagine that you have retries configured at both levels - once the SDK retries are exhausted, the function retries would kick in so you end up multiplying the max retries from both levels. On the other hand, if an error occurs from the SDK that is NOT retriable, such as the message lock being lost when attempting to complete the message, the function retry logic will still kick in even though the SDK knew not to retry. This is what we are planning on fixing in the future, and why I said using both together does not work great right now.

I think of function level retries as being most useful when you have logic that is outside of the Service Bus SDK actions that may need to be retried. If you are only concerned about retrying SDK operations, it is not really needed.

Now let’s get to the delivery count. The delivery count of a message is incremented by the service every time it delivers the message to a receiver. So when the trigger receives a message and delivers it to your function, the delivery count is 1. If your function doesn’t complete it, the message remains in the queue. The next time the trigger receives the message, the delivery count would be 2. The function and Service Bus SDK retries will not generally cause the delivery count to be incremented - if while receiving a message, there is a transient error, the receive will be retried but the message should still have a delivery count of 1 when it is received as it was only actually delivered once. The reason the delivery count gets incremented is that the trigger just continually receives messages in a loop, so it will eventually get the same message again if you don’t complete it in your function.

0reactions
erwinkramercommented, Aug 5, 2021

@JoshLove-msft thanks for all your comments. I’ve checked by DeliveryCount on the queue itself and it was way too high, I’ve set it to 10 and it works as expected. Not an issue caused by the beta package after all.

Since we talked about so many different types of retries, I went ahead and made a table to breakdown the different configurations and their effect on Service Bus behavior. Does it make sense?

Retry configuration Function retry Function ServiceBus SDK retry ServiceBus retry
Where to configure Function Retry policies as decoration for a single Function, or at the retry element in Function settings extensions:serviceBus:clientRetryOptions element in Function settings DeliveryCount on a queue
In effect when Transient errors caused by the function runtime Transient errors caused by the servicebus ‘client’ SDK when sending a message or when receiving from a ServiceBus trigger Unhandled exceptions inside the Run() of a function with a ServiceBus trigger
Compatible with ServiceBus triggers Not at the moment Yes Yes
Can increment the SB DeliveryCount on retry Maybe? No Yes
Read more comments on GitHub >

github_iconTop Results From Across the Web

Service Bus webjobs extension should cancel retry loop if ...
We should make sure that the function retry logic does not happen when the message lock is lost before the functions retry feature...
Read more >
Trying to use retryOptions for service bus trigger on Azure ...
I am just trying to follow this article in order to customize retrying behavior on the service bus trigger (which is I believe...
Read more >
Azure Functions error handling and retry guidance
Learn how to handle errors and retry events in Azure Functions, with links to specific binding errors, including information on retry ...
Read more >
Azure Service Bus Retry Options Not Working (v5.2.0)
When reading or publishing a message fails due to an error that is deemed transient, these settings are applied to client retries.
Read more >
Building resilient azure functions with retry policies
When processing of a message fails with a transient (retriable) error, function code should bubble up exception up the function runtime, which ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found