Error with Azure ServiceBus
See original GitHub issueMy team is moving from Redis & RabbitMQ to azure ServiceBus and AzureStorage for distributed locking. With the new config pointing to azure services, we are seeing errors randomly on the nodes like below. We also notice after updating a workflow, it doesn’t update on some nodes. They still run with older version of workflow. Is there something we should be looking to diagnose the issues?
7 nodes running as a Linux containers on-prem.
builder.Services.AddElsa(options =>
options.AddFeatures(new[] { typeof(Rentlyzer.Workflow.StartUp) }, builder.Configuration)
#region Distributed Lock Provider
//.ConfigureDistributedLockProvider(options => options.UseRedisLockProvider())
.ConfigureDistributedLockProvider(options => options.UseAzureBlobLockProvider(new Uri(builder.Configuration.GetConnectionString("ElsaAzureLocking")!)))
#endregion
#region Service Bus Broker
.UseAzureServiceBus(builder.Configuration.GetConnectionString("ElsaServiceBus")!)
.UseRebusCacheSignal()
//.PurgeAzureSubscriptionOnStartup(builder.Configuration.GetConnectionString("ElsaServiceBus")!)
//.UseRabbitMq($"amqp://admin:justhitenter@{rabbitMQHostname}:{rabbitMQPort}")
#endregion
)
Rebus.Workers.ThreadPoolBased.ThreadPoolWorker[0] An error occurred when attempting to complete the transaction context Rebus.Exceptions.RebusApplicationException: Could not complete message with ID c0d6fece-6fb8-4bf2-99c8-6c52f3a33e49 and lock token 8089ce40-f405-4e8f-906f-4db95b47924b
---> Azure.Messaging.ServiceBus.ServiceBusException: The lock supplied is invalid. Either the lock expired, or the message has already been removed from the queue. For more information please see https://aka.ms/ServiceBusExceptions . Reference:084c036d-c7e7-45f1-96c4-c5405b6a6dc1, TrackingId:f6c4058400000004006a0595649f2c0c_G18_B22, SystemTracker:G18:15106129:amqps://....ws.servicebus.windows.net/-85fe418f;54:60:62:source(address:/execute-workflow-definition-request-default,filter:[]), Timestamp:2023-06-30T19:33:38 (MessageLockLost). For troubleshooting information, see https://aka.ms/azsdk/net/servicebus/exceptions/troubleshoot.
at Azure.Messaging.ServiceBus.Amqp.AmqpReceiver.DisposeMessageAsync(Guid lockToken, Outcome outcome, TimeSpan timeout)
at Azure.Messaging.ServiceBus.Amqp.AmqpReceiver.CompleteInternalAsync(Guid lockToken, TimeSpan timeout)
at Azure.Messaging.ServiceBus.Amqp.AmqpReceiver.<>c.<<CompleteAsync>b__43_0>d.MoveNext()
--- End of stack trace from previous location ---
at Azure.Messaging.ServiceBus.ServiceBusRetryPolicy.<>c__22`1.<<RunOperation>b__22_0>d.MoveNext()
--- End of stack trace from previous location ---
at Azure.Messaging.ServiceBus.ServiceBusRetryPolicy.RunOperation[T1,TResult](Func`4 operation, T1 t1, TransportConnectionScope scope, CancellationToken cancellationToken, Boolean logRetriesAsVerbose)
at Azure.Messaging.ServiceBus.ServiceBusRetryPolicy.RunOperation[T1,TResult](Func`4 operation, T1 t1, TransportConnectionScope scope, CancellationToken cancellationToken, Boolean logRetriesAsVerbose)
at Azure.Messaging.ServiceBus.ServiceBusRetryPolicy.RunOperation[T1](Func`4 operation, T1 t1, TransportConnectionScope scope, CancellationToken cancellationToken)
at Azure.Messaging.ServiceBus.Amqp.AmqpReceiver.CompleteAsync(Guid lockToken, CancellationToken cancellationToken)
at Azure.Messaging.ServiceBus.ServiceBusReceiver.CompleteMessageAsync(ServiceBusReceivedMessage message, CancellationToken cancellationToken)
at Rebus.AzureServiceBus.AzureServiceBusTransport.<>c__DisplayClass37_0.<<Receive>b__0>d.MoveNext()
--- End of inner exception stack trace ---
at Rebus.AzureServiceBus.AzureServiceBusTransport.<>c__DisplayClass37_0.<<Receive>b__0>d.MoveNext()
--- End of stack trace from previous location ---
at Rebus.Transport.TransactionContext.InvokeAsync(Func`2 actions)
at Rebus.Transport.TransactionContext.RaiseCompleted()
at Rebus.Transport.TransactionContext.Complete()
at Rebus.Workers.ThreadPoolBased.ThreadPoolWorker.ProcessMessage(TransactionContext context, TransportMessage transportMessage)
Issue Analytics
- State:
- Created 3 months ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
Troubleshooting Service Bus issues
This troubleshooting guide covers failure investigation techniques, common errors for the credential types in the Azure Service Bus .
Read more >Troubleshooting guide for Azure Service Bus
Troubleshooting guide for Azure Service Bus · In this article · Connectivity, certificate, or timeout issues · Issues that may occur with service ......
Read more >Azure Service Bus Resource Manager exceptions
List of Service Bus exceptions surfaced by Azure Resource Manager and ... Error: Bad Request; Error code: 429; Error code: Not Found ...
Read more >Server error in Azure service bus - Microsoft Q&A
Hi, We are using the Azure service bus (standard tier) to stream data. But in the past 30 days, I can see 80...
Read more >Troubleshoot AMQP errors in Azure Service Bus
This article provides some of the errors you receive when using AMQP with Azure Service Bus. They're all standard behaviors of the service....
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Indeed. And not just any single activity, but the overal time it takes for a burst of execution needs to be below the lock duration threshold.
As a general rule of thumb, activities should execute quickly. Activities that require more time should schedule a background job and suspend themselves. Once the background job completes, it should then resume the activity.
This will mitigate the peek lock timeout to exceed, since scheduling work using e.g. Hangfire is a quick operation.
Here’s an example of an activity that uses Hangfire to schedule work in the background, suspends itself, and gets resumed once the job completes:
The above code targets Elsa 2.
Notice that the method executed by Hangfire lives inside of the activity class itself, but this is not mandatory - you could easily move this to a separate “job” class if you want.
In Elsa 3, we can simplify the above activity as follows:
Notice that the Elsa 3 activity is of the “Job” activity kind. The (default) workflow execution pipeline sees this and basically takes care of scheduling the background job for you, as opposed in the Elsa 2 example where the activity needs to do this.
Hangfire runs on the same nodes, in the same application.