[BUG] Infinite ServiceBus retry loop in Microsoft.Azure.WebJobs.Extensions.ServiceBus 5.0.0-beta.5
See original GitHub issueDescribe the bug When using Microsoft.Azure.WebJobs.Extensions.ServiceBus 5.0.0-beta.5 messages will keep retrying indefinitely .
Expected behavior Messages going to the deadletter queue.
Actual behavior (include Exception or Stack Trace) Let’s see the host config when the Azure Function boots, there is a maxRetryCount of 2:
[2021-08-04T16:46:12.832Z] Host configuration file read:
[2021-08-04T16:46:12.833Z] {
[2021-08-04T16:46:12.834Z] "version": "2.0",
[2021-08-04T16:46:12.840Z] "retry": {
[2021-08-04T16:46:12.841Z] "delayInterval": "00:00:03",
[2021-08-04T16:46:12.842Z] "maxRetryCount": 2,
[2021-08-04T16:46:12.862Z] "strategy": "fixedDelay"
[2021-08-04T16:46:12.864Z] },
[2021-08-04T16:46:12.865Z] "logging": {
[2021-08-04T16:46:12.866Z] "logLevel": {
[2021-08-04T16:46:12.868Z] "default": "Information"
[2021-08-04T16:46:12.873Z] },
[2021-08-04T16:46:12.900Z] "applicationInsights": {
[2021-08-04T16:46:12.902Z] "samplingSettings": {
[2021-08-04T16:46:12.903Z] "isEnabled": true,
[2021-08-04T16:46:12.905Z] "excludedTypes": "Dependency;Event;Request"
[2021-08-04T16:46:12.906Z] }
[2021-08-04T16:46:12.907Z] }
[2021-08-04T16:46:12.923Z] }
[2021-08-04T16:46:12.925Z] }
…and the servicebus retries, that’s 2 as well:
[2021-08-04T16:46:14.307Z] ServiceBusOptions
[2021-08-04T16:46:14.310Z] {
[2021-08-04T16:46:14.313Z] "ClientRetryOptions": {
[2021-08-04T16:46:14.315Z] "Mode": "Exponential",
[2021-08-04T16:46:14.317Z] "TryTimeout": "00:01:00",
[2021-08-04T16:46:14.318Z] "Delay": "00:00:00.8000000",
[2021-08-04T16:46:14.320Z] "MaxDelay": "00:02:00",
[2021-08-04T16:46:14.324Z] "MaxRetries": 2
[2021-08-04T16:46:14.326Z] },
[2021-08-04T16:46:14.327Z] "TransportType": "AmqpTcp",
[2021-08-04T16:46:14.329Z] "WebProxy": "",
[2021-08-04T16:46:14.331Z] "AutoCompleteMessages": true,
[2021-08-04T16:46:14.332Z] "PrefetchCount": 0,
[2021-08-04T16:46:14.336Z] "MaxAutoLockRenewalDuration": "00:05:00",
[2021-08-04T16:46:14.338Z] "MaxConcurrentCalls": 50,
[2021-08-04T16:46:14.339Z] "MaxConcurrentSessions": 8,
[2021-08-04T16:46:14.340Z] "MaxMessageBatchSize": 1000,
[2021-08-04T16:46:14.341Z] "SessionIdleTimeout": "00:01:00"
[2021-08-04T16:46:14.344Z] }
My binding looks like this, from queue-A to queue-B, I’ve put a function scoped retry count to 2:
[FunctionName("LoopyLoop")]
[ExponentialBackoffRetry(2, "00:00:04", "00:15:00")]
public static async Task Run(
[ServiceBusTrigger("queue-A", Connection = "some-user-assigned-id")] ServiceBusReceivedMessage myQueueItem, ServiceBusMessageActions messageActions,
[ServiceBus("queue-B", Connection = "some-user-assigned-id")] ServiceBusSender sender, ILogger log)
{
If I understand correctly, for retries: the host setting will be overridden by the function setting, which is 2, and the servicebus setting is 2, which makes a total of (2*2) 4 attempts at most. I created a settlement, this always gets hit (with a hard coded throw exception) :
var count = myQueueItem.DeliveryCount;
if(count > 30)
{
await messageActions.DeadLetterMessageAsync(myQueueItem, "Things that grind your gears", "Infinite Retries");
return;
}
As you see, it gets hit at attempt 31. This is the only way my Function can stop the retry loop. Without this, it will keep going.
To Reproduce
- Create a SB Premium instance
- Create a SB triggered function that will send to another queue, and keep retrying by hard coding an exception…
Environment:
- Windows locally or inside the Azure host, same behavior in both.
Libraries:
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<TargetFramework>netcoreapp3.1</TargetFramework>
<AzureFunctionsVersion>v3</AzureFunctionsVersion>
<_FunctionsSkipCleanOutput>true</_FunctionsSkipCleanOutput>
</PropertyGroup>
<ItemGroup>
<FrameworkReference Include="Microsoft.AspNetCore.App" />
<PackageReference Include="Azure.Storage.Blobs" Version="12.9.1" />
<PackageReference Include="Microsoft.Azure.Cosmos.Table" Version="1.0.8" />
<PackageReference Include="Microsoft.Azure.WebJobs" Version="3.0.27" />
<PackageReference Include="Microsoft.Azure.WebJobs.Extensions.Http" Version="3.0.12" />
<PackageReference Include="Microsoft.Azure.WebJobs.Extensions.ServiceBus" Version="5.0.0-beta.5" />
<PackageReference Include="Microsoft.Graph" Version="3.20.0" />
<PackageReference Include="Microsoft.Graph.Auth" Version="1.0.0-preview.5" />
<PackageReference Include="Microsoft.Identity.Client" Version="4.35.1" />
<PackageReference Include="Microsoft.IdentityModel.Protocols.OpenIdConnect" Version="6.12.0" />
<PackageReference Include="Microsoft.NET.Sdk.Functions" Version="3.0.13" />
<PackageReference Include="System.Diagnostics.DiagnosticSource" Version="4.7.0" />
</ItemGroup>
<ItemGroup>
Issue Analytics
- State:
- Created 2 years ago
- Comments:19 (10 by maintainers)
Top GitHub Comments
I think there are really only two categories of retry - Function level, and Service Bus SDK level. You can kind of think of the incremented delivery count as the service retrying, but let’s treat that separately.
The function level retry is configured by adding the retry attribute or in the function settings as you mention. The function retry kicks in when your function has an unhandled exception.
The Service Bus SDK retries are configured as you show in the table. These retries occur when any transient error occurs while the trigger is attempting to receive messages and deliver them to your function, or when you interact with any of the SDK types from within your function, e.g. ServiceBusMessageActions/ServiceBusSender/etc.
Now, imagine that you have retries configured at both levels - once the SDK retries are exhausted, the function retries would kick in so you end up multiplying the max retries from both levels. On the other hand, if an error occurs from the SDK that is NOT retriable, such as the message lock being lost when attempting to complete the message, the function retry logic will still kick in even though the SDK knew not to retry. This is what we are planning on fixing in the future, and why I said using both together does not work great right now.
I think of function level retries as being most useful when you have logic that is outside of the Service Bus SDK actions that may need to be retried. If you are only concerned about retrying SDK operations, it is not really needed.
Now let’s get to the delivery count. The delivery count of a message is incremented by the service every time it delivers the message to a receiver. So when the trigger receives a message and delivers it to your function, the delivery count is 1. If your function doesn’t complete it, the message remains in the queue. The next time the trigger receives the message, the delivery count would be 2. The function and Service Bus SDK retries will not generally cause the delivery count to be incremented - if while receiving a message, there is a transient error, the receive will be retried but the message should still have a delivery count of 1 when it is received as it was only actually delivered once. The reason the delivery count gets incremented is that the trigger just continually receives messages in a loop, so it will eventually get the same message again if you don’t complete it in your function.
@JoshLove-msft thanks for all your comments. I’ve checked by DeliveryCount on the queue itself and it was way too high, I’ve set it to 10 and it works as expected. Not an issue caused by the beta package after all.
Since we talked about so many different types of retries, I went ahead and made a table to breakdown the different configurations and their effect on Service Bus behavior. Does it make sense?