question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cosmos sdk throws 408 cosmosException, but operation succeded in the database

See original GitHub issue

Description Using the v4 preview.

When requests take a long time to return a response, the sdk throws an exception with code 408 - Request Timeout. However, if you’d then go looking in the db, the operation actually happened.

Currently our case is we insert a new item in the db. Sometimes we get the 408 exception, but if we retry we get a 409 - Conflict response from the database (saying an entity with this id already exists). This because the operation actually succeeded on the database, the sdk just threw a timeout.

So if you set a break point when such an exception is thrown. And then when it is hit, you go look in the database, you will see the document exists (and thus was created by the request that threw the 408 exception)

To Reproduce We have the following code:

var policyResult = await _retryOnHeavyLoadPolicy.ExecuteAsync(async () =>
{
    await store.CreateItemAsync(entity, new PartitionKey(entity.Discriminator), null, cancellationToken);

    return Task.CompletedTask;
}).ConfigureAwait(false);

Where the heavy load retry policy is a Polly.Net policy defined as:

Policy
    .Handle<Exception>(x =>
        x is CosmosException
        && (((CosmosException)x).Status == (int)HttpStatusCode.TooManyRequests
            || ((CosmosException)x).Status == (int)HttpStatusCode.RequestTimeout))
    .WaitAndRetryForeverAsync(retryAttempt => TimeSpan.FromSeconds(Math.Pow(retryAttempt, 0.5)));

This policy will then retry the create statement if we gat 429 - TooManyRequest or a 408 - Timeout exception.

To reproduce you will need a pretty heavy load and a bad / slow internet connection though.

Environment summary SDK Version: v4.0.0-preview3 OS Version: Windows 10

Question Due to our retry policy, when creating an item we sometimes get a 408 -> retry -> 409. How should we handle the 408 exceptions that we are sure the item is really created in the database (or updated, deleted, … any action)? Without retrying, because that causes a 409.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:16 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
ealsurcommented, May 27, 2020

The main item is created and never modified across the system. The partitionKey is set only once. Notice that this behavior is not always apparent, it happens like 70% of the time. Even though everything happens in parallel, i verified that each path of this particular issue is thread safe. I didn’t try printing as the output would be a mess since this specific ExtensionMethod is called so many times. So if you have ideas, i’m all ears.

Try assigning local variables instead of using item.Id and item.PartitionKey directly, the same with the end result comparison. Your code is already doing a `Console.Write, that is why I thought you could add to the same write, those values.

Basically:

string partitionKey = item.PartitionKey;
string id = item.Id;
using (ResponseMessage responseMessage = await container.ReadItemStreamAsync(id, new PartitionKey(partitionKey)))
{
    // Deserialize and compare the response's id and PK
}

Most probably it is on the service side. You guys can reach to the correct team, I’m unable to.

If this is the case, a support ticket needs to be raised.

0reactions
ealsurcommented, Nov 16, 2022

Closing due to inactivity.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshoot Azure Cosmos DB HTTP 408 or request ...
The HTTP 408 error occurs if the SDK was unable to complete the request before the timeout limit occurred. It is important to...
Read more >
408 status code from Cosmos DB using SDK v3
I have an API (.NET Core 2.2) which retrieves documents from Cosmos DB using SDK v3.5.0. Currently some requests are throwing an exception ......
Read more >
Azure Cosmos DB 4xx Status Codes
The nature of Cosmos DB means that sometimes requests will return HTTP status codes in the 400 range when operations do not succeed....
Read more >
Azure Cosmos DB Troubleshoot Service Availability 503 ...
Learn how to troubleshoot service availability 503 and timeout 408 issues connectivity issues with Azure Cosmos DB.
Read more >
Troubleshoot an application using the Azure Cosmos DB ...
Use the endpoint and key to connect to the Azure Cosmos DB for NoSQL ... CosmosException : Response status code does not indicate...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found