question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Guidance around AllowBulkExecution

See original GitHub issue

At what point does it make sense to enable AllowBulkExecution? Does it have any impact on reads?

At the moment we’re uploading just over 100 records to one container (indexed is about 1600RUs) and another 100 key-value records to another container (non-indexed is about 1300RUs, for comparison indexed is about 6500RUs). The containers are written in two different threads. The items are created using this approach.

Container container = cosmosClient.GetContainer("myDb", "myCollection");

// Assuming your have your data available to be inserted or read
List<Task> concurrentTasks = new List<Task>();
foreach(Item itemToInsert in ReadYourData())
{
    concurrentTasks.Add(container.CreateItemAsync(itemToInsert, new PartitionKey(itemToInsert.MyPk)));
}

await Task.WhenAll(concurrentTasks);

When locally developing against the Cosmos Emulator it executes about 4-5times slower if I set AllowBulkExecution to true. When running in debug mode I’m seeing "DocDBTrace Information: 0 : Batch is full - Max operation count 100 reached. "

When writing to real Cosmos in Azure, the difference isn’t as pronounced (could be network latency due to running function locally), but AllowBulkExecution=true is still slower.

Am I correct that for this dataset, it appears it makes sense to leave AllowBulkExecution as default (false)? In the future we will need to handle different datasets and much larger / more numerous data. When will it make sense to enable AllowBulkExecution?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:12 (12 by maintainers)

github_iconTop GitHub Comments

1reaction
Camioscommented, Jun 10, 2020

@ealsur we have a private api to upload data that occurs once or twice a day and a public api with reads throughout the day. To start off with, the throughput was the min 400RUs, but the upload (with bulk) was getting rate limited and returning 429 errors after createitem calls exceeded max retries. So it appears bulk utilises more than all of the available throughput to the point of failure. I.e. bulk isn’t there to play safe within the throughput limits. Although, I initially (mis)read the previous comment that bulk might work to stay within the container throughput.

My guess is there also needs to be a minimum bound on the throughput based on max op/s * RU/op.

But how does one determine a container’s suitable throughout, to avoid the rate limiting, given we don’t control the factors that affect the max ops/s? Assuming we can know ahead of time the RU/op.

Otherwise, switching to autoscaling throughput? But that is a 50% cost premium which seems unfair as an alternative, given picking a fixed throughput is a guess due to factors outside of our control.

If you get the fixed throughput wrong you either encounter max retry errors or excessive cost. So I wonder, can the client SDK provide an alternative, let’s call it “robust” throttling for scenarios where the upload needing to finish is more important than how long it takes, so that it doesn’t exceed the container throughput and get the 429 errors? I thought this might be possible given the sdk could already be aware (or informed by caller) of the container throughput and possibly be aware of the RU costs for the items client side, or using item stream or batching and back off when retries are encountered.

0reactions
msftbot[bot]commented, Dec 15, 2021

Closing due to in-activity, pease feel free to re-open.

Read more comments on GitHub >

github_iconTop Results From Across the Web

CosmosClientOptions.AllowBulkExecution Property
Allows optimistic batching of requests to service. Setting this option might impact the latency of the operations. Hence this option is recommended for ......
Read more >
Troubleshoot an application using the Azure Cosmos DB ...
In this lab, we'll create a menu driven program that will allow us to insert or delete one of two documents. The main...
Read more >
c# - .NET Core API: PartitionKey extracted from document ...
I have created a simple API, along with Swagger API documentation, which will operate on a single entity. So far I have done...
Read more >
Controlling an Application's Throughput Consumption ... - David
In Cosmos DB, this type of behaviour can show up as RU exhaustion — where because one process is grabbing too many RUs,...
Read more >
Cosmos DB | Lenni's Technology Blog
Let's use a multi-tenant scenario to explain and understand hierarchical partition keys. Each tenant has user documents, and since each tenant ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found