Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[QUERY] concurrency in batching

See original GitHub issue

Library name and version

Azure.Messaging.ServiceBus 7.13.1

Query/Question

The following methods seem to be extremely slow:

(await sender.CreateMessageBatchAsync(cancellationToken)).TryAddMessage(...)
SendMessagesAsync

In a scenario like below - using a recursion to build a list of safe batches to send in parallel.

        public async Task SendBatchesAsync(
            string topicOrQueueName,
            List<MessageBatch> messages,
            CancellationToken cancellationToken = default)
        {
           ...
           // illustrative purpose only - client and sender are DI'ed from container
            var clientOptions = new ServiceBusClientOptions
            {
                TransportType = ServiceBusTransportType.AmqpWebSockets
            };
            var client = new ServiceBusClient(
                "xxxxxx.servicebus.windows.net",
                new DefaultAzureCredential(),
                clientOptions);
            var sender = client.CreateSender(topicOrQueueName);

            var result = await CreateBatchesSender(
                sender,
                messages,
                new List<ServiceBusMessageBatchWrapper>(),
                new ServiceBusBatchResult() { Failure = new ServiceBusSendBatchException("batch insert failures") },
                indexPointer: 0,
                cancellationToken);

            // Calling DisposeAsync on client types is required to ensure that network
            // resources and other unmanaged objects are properly cleaned up.
            await sender.DisposeAsync();
            await client.DisposeAsync();

            if (!result.AllSucceeded)
            {
                throw result.Failure;
            }
        }

        private async Task<ServiceBusBatchResult> CreateBatchesSender(
            ServiceBusSender sender,
            List<MessageBatch> messages,
            List<ServiceBusMessageBatchWrapper> batches,
            ServiceBusBatchResult result,
            int indexPointer,
            CancellationToken cancellationToken)
        {
            var currentList = messages.Skip(indexPointer).ToList();

            if (currentList.Count == 0)
            {
                return result;
            }

            // Start sending the previous batch whilst the new one is building
            Task execSend = batches?.Count > 0 ? sender.SendMessagesAsync(batches?.Last().Batch, cancellationToken) : Task.FromResult(true);

            int index = indexPointer;

            _logger.LogDebug("creating batcher...");

            var customBatch = new ServiceBusMessageBatchWrapper
            {
                Batch = await sender.CreateMessageBatchAsync(cancellationToken),
            };

            _logger.LogDebug("finished creating batch sender, starting to process...");

            for (int b = 0; b < currentList.Count; b++)
            {
                if (!customBatch.Batch.TryAddMessage(BuildServiceBusMessage(currentList[b].Payload, currentList[b].Context)))
                {
                    // do not increment index
                    _logger.BatchSizeExceeded(index);
                    break;
                }
                customBatch.AddIndex(index);
                index++;
            }

            batches.Add(customBatch);

            return await CreateBatchesSender(
                sender,
                messages,
                batches,
                await CaptureException(execSend, batches?.Last()?.Indexes, result),
                index,
                cancellationToken);
        }

        internal static ServiceBusMessage BuildServiceBusMessage(
            string payload,
            IMessageContext? messageContext)
        {
            var serviceBusMessage = new ServiceBusMessage(new BinaryData(payload))
            {
                ContentType = ApplicationJson,
                CorrelationId = messageContext?.CorrelationId ?? "",
                Subject = messageContext?.Label,
                ScheduledEnqueueTime = messageContext?.ScheduledEnqueueTimeUtc ?? DateTimeOffset.UtcNow
            };
            serviceBusMessage.AddApplicationProperties(messageContext?.CustomProperties);

            return serviceBusMessage;
        }

        /// <summary>
        ///     Executes and captures any exception from the background operation
        /// </summary>
        /// <param name="task"></param>
        /// <param name="indexes"></param>
        /// <param name="result"></param>
        /// <returns></returns>
        internal static async Task<ServiceBusBatchResult> CaptureException(Task task, List<int> indexes, ServiceBusBatchResult result)
        {
            try
            {
                await task;
            }
            catch (Exception ex)
            {
                result.Failure.Exceptions.Add((indexes, ex));
                result.AllSucceeded = false;
            }
            return result;
        }

From looking at the app insights dependency analysis it seems TryAddMessage is not extremely fast, and more importantly the await sender.CreateMessageBatchAsync(cancellationToken) takes anywhere between 1 - 3 seconds.

If you are looping through 18k items that are split into 62 batches this is obviously run 62 times which would on its own contribute to ~2mins run time.

the sender is created above which is scoped to a specific queue, and it is my understanding that at this point the AMQP connection is established and just re-used - in. this case inside the recursive call.

For reference the creation of a List<T> of 18k with serialization of the “payload” into a string takes sub 1s as expected.

The same series of operations in the Go SDK takes about ~5s for a 50k file which also includes a download from Blob of that file.

Any thoughts/pointers would be welcomed.

PS: some implementations included returning a list of safe batches and then Task.WhenAll(sender.SendMEssagesAsync(...) but this was timing out after a minute…

PS2: AWS/GCP both include an errorList in their response for batched operations - maybe something like this could be added here.

Environment

.NET SDK:
 Version:   7.0.201
 Commit:    68f2d7e7a3

Runtime Environment:
 OS Name:     Mac OS X
 OS Version:  12.6
 OS Platform: Darwin
 RID:         osx.12-x64
 Base Path:   /usr/local/share/dotnet/sdk/7.0.201/

Host:
  Version:      7.0.3
  Architecture: x64
  Commit:       0a2bda10e8

VSforMac 17.5.1 (build 23)

Issue Analytics

State:
Created 6 months ago
Comments:6 (2 by maintainers)

Top GitHub Comments

1reaction

jsquirecommented, Apr 3, 2023

Hi @dnitsch. Thank you for the additional context. I think that I now understand where some issues may be, though we’d need to capture SDK logs for a 5-minute time slice around the issue to be sure.

When you’re calling SendMessagesAsync on a single sender, you’re attempting to transmit each batch over the same AMQP link. Though you can make the calls concurrently, each operation on a link is queued, allowing only one outstanding send, ending when the service acknowledges receipt. Each operation uses the TryTimeout configured on the client to govern how long it is allowed to remain in an active state. By default, this is 60 seconds; unless all of your sends complete within that time, they will fail with a timeout and Task.WhenAll will see a faulted task, causing it to throw.

With 16 virtual cores, the host machine can perform only 16 operations concurrently. Starting 62 concurrent tasks means that you’ll potentially see continuations for async operations getting queued and waiting to resume. Since there is no fairness in scheduling, even when the system is lively and continuing to make forward progress, some tasks may end up running longer than their timeout while waiting to be scheduled. Scenarios that trigger retries, such as throttling and transient failures, may exacerbate this.

RE: the clarification - maybe I just need to get clarification on my understanding actually - if I create a “safe batch” i.e. one created by TryAddMessage and send it and an exception occurs, is it safe to assume that no messages from that batch were inserted?

Yes, each SendAsync call - regardless of whether ServiceBusMessageBatch is used or not - is atomic. All messages will either succeed or fail as one unit. However, it is important to note that this is NOT true of the Task.WhenAll that you’re using. Each of those SendAsync tasks is independent and will succeed/fail atomically but you have potential for partial success across tasks.

Recommendations

Take a peek at Best Practices for performance improvements using Service Bus Messaging, which discusses some high-level considerations around Azure resources.
Consider the amount of concurrency that you need against the available resources in the host. It varies greatly by the application, host environment, and workload. The current 4:1 ratio of tasks to virtual core may or may not be ideal for achieving the throughput that you’re looking for. We generally recommend starting with a 2:1 ratio and testing under real-world conditions to find the
If you’re intending to perform concurrent sends, then you’ll want to create the same number of senders as the degree of concurrency that you’d like. Each will create a dedicated AMQP link, allowing them to transmit concurrently - within limits of the network and service. Depending on how many degrees of concurrency you select, you may want to create additional ServiceBusClient instances to spread sends out across connections for better throughput. Since throughput will vary depending on a number of factors, it is recommended that you test with your application to discover the best balance.

I’m going to mark this as addressed, but please feel free to unresolve if you’d like to continue the discussion.

0reactions

dnitschcommented, Apr 3, 2023

thanks @jsquire - nice and informative.

I’ll re-paste the link from above as the hyperlink has a typo in it Best Practices for performance improvements using Service Bus Messaging

Here is the specific section talking about creating multiple senders, which, like me, people may have missed 😄

Top Results From Across the Web

Batch Updates and Concurrency - The Art of PostgreSQL

Today's article is going to address concurrency in the context of updating data in a batch. This activity is quite common, as soon...

Limit the number of batch queries

Define the maximum number of concurrent jobs queries to be handled by mbatchd in the parameter MAX_CONCURRENT_QUERY in lsb.params: · To limit all...

Batching Client GraphQL Queries

Batching is the process of taking a group of requests, combining them into one, and making a single request with the same data...

Concurrent queries in batch reports versus interactive reports

The concurrent query manager processes queries starting from the last query of the report and works its way to the first. When reports...

Concurrency issue on Batches - apex

I have two batch classes that run concurrently. Let's call them Job A and Job B. ... Job A inserts records in object...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

[QUERY] concurrency in batching

Library name and version

Query/Question

Environment

Issue Analytics

Top GitHub Comments

Recommendations

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

[BUG] Blob Trigger Scan continuously firing, and processing the same blobs

[BUG] Key Vault: Certificate appears to be serialized incorrectly