question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error with Azure ServiceBus

See original GitHub issue

My team is moving from Redis & RabbitMQ to azure ServiceBus and AzureStorage for distributed locking. With the new config pointing to azure services, we are seeing errors randomly on the nodes like below. We also notice after updating a workflow, it doesn’t update on some nodes. They still run with older version of workflow. Is there something we should be looking to diagnose the issues?

7 nodes running as a Linux containers on-prem.

    builder.Services.AddElsa(options =>

        options.AddFeatures(new[] { typeof(Rentlyzer.Workflow.StartUp) }, builder.Configuration)
        #region Distributed Lock Provider
        //.ConfigureDistributedLockProvider(options => options.UseRedisLockProvider())
        .ConfigureDistributedLockProvider(options => options.UseAzureBlobLockProvider(new Uri(builder.Configuration.GetConnectionString("ElsaAzureLocking")!)))
        #endregion
        #region Service Bus Broker
        .UseAzureServiceBus(builder.Configuration.GetConnectionString("ElsaServiceBus")!)
        .UseRebusCacheSignal()
        //.PurgeAzureSubscriptionOnStartup(builder.Configuration.GetConnectionString("ElsaServiceBus")!)
        //.UseRabbitMq($"amqp://admin:justhitenter@{rabbitMQHostname}:{rabbitMQPort}")
        #endregion

        )

Rebus.Workers.ThreadPoolBased.ThreadPoolWorker[0] An error occurred when attempting to complete the transaction context Rebus.Exceptions.RebusApplicationException: Could not complete message with ID c0d6fece-6fb8-4bf2-99c8-6c52f3a33e49 and lock token 8089ce40-f405-4e8f-906f-4db95b47924b 
---> Azure.Messaging.ServiceBus.ServiceBusException: The lock supplied is invalid. Either the lock expired, or the message has already been removed from the queue. For more information please see https://aka.ms/ServiceBusExceptions . Reference:084c036d-c7e7-45f1-96c4-c5405b6a6dc1, TrackingId:f6c4058400000004006a0595649f2c0c_G18_B22, SystemTracker:G18:15106129:amqps://....ws.servicebus.windows.net/-85fe418f;54:60:62:source(address:/execute-workflow-definition-request-default,filter:[]), Timestamp:2023-06-30T19:33:38 (MessageLockLost). For troubleshooting information, see https://aka.ms/azsdk/net/servicebus/exceptions/troubleshoot.
 at Azure.Messaging.ServiceBus.Amqp.AmqpReceiver.DisposeMessageAsync(Guid lockToken, Outcome outcome, TimeSpan timeout)
 at Azure.Messaging.ServiceBus.Amqp.AmqpReceiver.CompleteInternalAsync(Guid lockToken, TimeSpan timeout)
 at Azure.Messaging.ServiceBus.Amqp.AmqpReceiver.<>c.<<CompleteAsync>b__43_0>d.MoveNext() 
--- End of stack trace from previous location ---
 at Azure.Messaging.ServiceBus.ServiceBusRetryPolicy.<>c__22`1.<<RunOperation>b__22_0>d.MoveNext()
--- End of stack trace from previous location ---
 at Azure.Messaging.ServiceBus.ServiceBusRetryPolicy.RunOperation[T1,TResult](Func`4 operation, T1 t1, TransportConnectionScope scope, CancellationToken cancellationToken, Boolean logRetriesAsVerbose)
 at Azure.Messaging.ServiceBus.ServiceBusRetryPolicy.RunOperation[T1,TResult](Func`4 operation, T1 t1, TransportConnectionScope scope, CancellationToken cancellationToken, Boolean logRetriesAsVerbose)
 at Azure.Messaging.ServiceBus.ServiceBusRetryPolicy.RunOperation[T1](Func`4 operation, T1 t1, TransportConnectionScope scope, CancellationToken cancellationToken)
 at Azure.Messaging.ServiceBus.Amqp.AmqpReceiver.CompleteAsync(Guid lockToken, CancellationToken cancellationToken)
 at Azure.Messaging.ServiceBus.ServiceBusReceiver.CompleteMessageAsync(ServiceBusReceivedMessage message, CancellationToken cancellationToken)
 at Rebus.AzureServiceBus.AzureServiceBusTransport.<>c__DisplayClass37_0.<<Receive>b__0>d.MoveNext() 
--- End of inner exception stack trace ---
 at Rebus.AzureServiceBus.AzureServiceBusTransport.<>c__DisplayClass37_0.<<Receive>b__0>d.MoveNext() 
--- End of stack trace from previous location ---
 at Rebus.Transport.TransactionContext.InvokeAsync(Func`2 actions)
 at Rebus.Transport.TransactionContext.RaiseCompleted()
 at Rebus.Transport.TransactionContext.Complete()
 at Rebus.Workers.ThreadPoolBased.ThreadPoolWorker.ProcessMessage(TransactionContext context, TransportMessage transportMessage)

Issue Analytics

  • State:open
  • Created 3 months ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
sfmskywalkercommented, Jul 4, 2023

Indeed. And not just any single activity, but the overal time it takes for a burst of execution needs to be below the lock duration threshold.

As a general rule of thumb, activities should execute quickly. Activities that require more time should schedule a background job and suspend themselves. Once the background job completes, it should then resume the activity.

This will mitigate the peek lock timeout to exceed, since scheduling work using e.g. Hangfire is a quick operation.

Here’s an example of an activity that uses Hangfire to schedule work in the background, suspends itself, and gets resumed once the job completes:

[Action(Category = "Blockchain", Description = "Index the blockchain", Outcomes = new[] { OutcomeNames.Done })]
public class IndexBlockchain : Activity
{
    private readonly IBackgroundJobClient _backgroundJobClient;
    private readonly IWorkflowInstanceDispatcher _workflowDispatcher;
    private readonly IBlockchainIndexer _blockchainIndexer;

    public record IndexBlockchainContext(string WorkflowInstanceId, string ActivityId);
    
    public IndexBlockchain(
        IBackgroundJobClient backgroundJobClient, // Hangfire service to schedule jobs.
        IWorkflowInstanceDispatcher workflowDispatcher, // Elsa service to resume workflow.
        IBlockchainIndexer blockchainIndexer) // Service to index the blockchain.
    {
        _backgroundJobClient = backgroundJobClient;
        _workflowDispatcher = workflowDispatcher;
        _blockchainIndexer = blockchainIndexer;
    }

    // This method is run by Hangfire and can take a long time to complete.
    public async Task IndexBlockchainAsync(IndexBlockchainContext context, CancellationToken cancellationToken)
    {
        // Do the long-running work.
        await _blockchainIndexer.IndexBlokchainAsync();
        
        // Resume the workflow.
        await _workflowDispatcher.DispatchAsync(new ExecuteWorkflowInstanceRequest(context.WorkflowInstanceId, context.ActivityId), cancellationToken);
    }

    protected override IActivityExecutionResult OnExecute(ActivityExecutionContext context)
    {
        // Schedule the job.
        _backgroundJobClient.Create<IndexBlockchain>(x => x.IndexBlockchainAsync(new IndexBlockchainContext( context.WorkflowInstance.Id, context.ActivityId), CancellationToken.None));

        // Suspend this activity.
        return Suspend();
    }

    // Called when the workflow is resumed.
    protected override IActivityExecutionResult OnResume(ActivityExecutionContext context)
    {
        // Job's done.
        return Done();
    }
}

The above code targets Elsa 2.

Notice that the method executed by Hangfire lives inside of the activity class itself, but this is not mandatory - you could easily move this to a separate “job” class if you want.

In Elsa 3, we can simplify the above activity as follows:

[Activity("Acme", "Blockchain", "Index the blockchain", Kind = ActivityKind.Job)]
public class IndexBlockchain : Activity
{
    protected override async ValueTask ExecuteAsync(ActivityExecutionContext context)
    {
        var blockchainIndexer = context.GetRequiredService<IBlockchainIndexer>();
        
        // Do the long-running work.
        await blockchainIndexer.IndexBlokchainAsync();
    }
}

Notice that the Elsa 3 activity is of the “Job” activity kind. The (default) workflow execution pipeline sees this and basically takes care of scheduling the background job for you, as opposed in the Elsa 2 example where the activity needs to do this.

0reactions
sfmskywalkercommented, Jul 4, 2023

Hangfire runs on the same nodes, in the same application.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshooting Service Bus issues
This troubleshooting guide covers failure investigation techniques, common errors for the credential types in the Azure Service Bus .
Read more >
Troubleshooting guide for Azure Service Bus
Troubleshooting guide for Azure Service Bus · In this article · Connectivity, certificate, or timeout issues · Issues that may occur with service ......
Read more >
Azure Service Bus Resource Manager exceptions
List of Service Bus exceptions surfaced by Azure Resource Manager and ... Error: Bad Request; Error code: 429; Error code: Not Found ...
Read more >
Server error in Azure service bus - Microsoft Q&A
Hi, We are using the Azure service bus (standard tier) to stream data. But in the past 30 days, I can see 80...
Read more >
Troubleshoot AMQP errors in Azure Service Bus
This article provides some of the errors you receive when using AMQP with Azure Service Bus. They're all standard behaviors of the service....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found