question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CosmosDb client throwing timeout Exception

See original GitHub issue

We are using “Microsoft.Azure.Cosmos” Version=“3.23.0” in Net Core application. After migrating to the newer client SDK, we are seeing intermittent failures from Cosmos. To Create CosmosClient and container we are using default values for the time out and not specifying any custom value to override it. here is the stack trace of Exception

`

<html> <body>
The operation was canceled.CosmosDiagnostics: {“Summary”:{“DirectCalls”:{“(200, 0)”:1,“(204, 0)”:20},“RegionsContacted”:1},“name”:“ReplaceItemAsync”,“id”:“fcafa2a7-6c50-4de5-86e4-a89ea4e0d0aa”,“caller info”:{“member”:“OperationHelperWithRootTraceAsync”,“file”:“ClientContextCore.cs”,“line”:244},“start time”:“01:03:27:626”,“duration in milliseconds”:1907.6462,“data”:{“Client Configuration”:{“Client Created Time Utc”:“2022-02-02T03:37:50.3122530Z”,“NumberOfClientsCreated”:1,“User Agent”:“cosmos-netstandard-sdk/3.23.0|3.23.1|1|X64|Linux 5.4.0-1064-azure 67 18.|.NET Core 3.1.22|L|”,“ConnectionConfig”:{“gw”:“(cps:50, urto:10, p:False, httpf: False)”,“rntbd”:“(cto: 5, icto: -1, mrpc: 30, mcpe: 65535, erd: True, pr: ReuseUnicastPort)”,“other”:“(ed:False, be:False)”},“ConsistencyConfig”:“(consistency: NotSet, prgns:[westus])”},“Operation Cancelled Exception”:“System.OperationCanceledException: The operation was canceled.\n at System.Threading.CancellationToken.ThrowOperationCanceledException()\n at Microsoft.Azure.Documents.TimeoutHelper.ThrowGoneIfElapsed()\n at Microsoft.Azure.Documents.StoreReader.ReadMultipleReplicaAsync(DocumentServiceRequest entity, Boolean includePrimary, Int32 replicaCountToRead, Boolean requiresValidLsn, Boolean useSessionToken, ReadMode readMode, Boolean checkMinLSN, Boolean forceReadAll)\n at Microsoft.Azure.Documents.ConsistencyWriter.WaitForWriteBarrierAsync(DocumentServiceRequest barrierRequest, Int64 selectedGlobalCommittedLsn)\n at Microsoft.Azure.Documents.ConsistencyWriter.WritePrivateAsync(DocumentServiceRequest request, TimeoutHelper timeout, Boolean forceRefresh)\n at Microsoft.Azure.Documents.BackoffRetryUtility1.ExecuteRetryAsync(Func1 callbackMethod, Func3 callShouldRetry, Func1 inBackoffAlternateCallbackMethod, TimeSpan minBackoffForInBackoffCallback, CancellationToken cancellationToken, Action1 preRetryCallback)\n at Microsoft.Azure.Documents.ShouldRetryResult.ThrowIfDoneTrying(ExceptionDispatchInfo capturedException)\n at Microsoft.Azure.Documents.BackoffRetryUtility1.ExecuteRetryAsync(Func1 callbackMethod, Func3 callShouldRetry, Func1 inBackoffAlternateCallbackMethod, TimeSpan minBackoffForInBackoffCallback, CancellationToken cancellationToken, Action1 preRetryCallback)\n at Microsoft.Azure.Documents.ConsistencyWriter.WriteAsync(DocumentServiceRequest entity, TimeoutHelper timeout, Boolean forceRefresh, CancellationToken cancellationToken)\n at Microsoft.Azure.Documents.ReplicatedResourceClient.<>c__DisplayClass31_0.<<InvokeAsync>b__0>d.MoveNext()\n— End of stack trace from previous location where exception was thrown —\n at Microsoft.Azure.Documents.RequestRetryUtility.ProcessRequestAsync[TRequest,IRetriableResponse](Func1 executeAsync, Func1 prepareRequest, IRequestRetryPolicy2 policy, CancellationToken cancellationToken, Func1 inBackoffAlternateCallbackMethod, Nullable1 minBackoffForInBackoffCallback)\n at Microsoft.Azure.Documents.ShouldRetryResult.ThrowIfDoneTrying(ExceptionDispatchInfo capturedException)\n at Microsoft.Azure.Documents.RequestRetryUtility.ProcessRequestAsync[TRequest,IRetriableResponse](Func1 executeAsync, Func1 prepareRequest, IRequestRetryPolicy2 policy, CancellationToken cancellationToken, Func1 inBackoffAlternateCallbackMethod, Nullable1 minBackoffForInBackoffCallback)\n at Microsoft.Azure.Documents.StoreClient.ProcessMessageAsync(DocumentServiceRequest request, CancellationToken cancellationToken, IRetryPolicy retryPolicy, Func2 prepareRequestAsyncDelegate)\n at Microsoft.Azure.Cosmos.Handlers.TransportHandler.ProcessMessageAsync(RequestMessage request, CancellationToken cancellationToken)\n at Microsoft.Azure.Cosmos.Handlers.TransportHandler.SendAsync(RequestMessage request, CancellationToken cancellationToken)\n at Microsoft.Azure.Cosmos.Handlers.RouterHandler.SendAsync(RequestMessage request, CancellationToken cancellationToken)\n at Microsoft.Azure.Cosmos.RequestHandler.SendAsync(RequestMessage request, CancellationToken cancellationToken)\n at Microsoft.Azure.Cosmos.Handlers.AbstractRetryHandler.ExecuteHttpRequestAsync(Func1 callbackMethod, Func3 callShouldRetry, Func3 callShouldRetryException, CancellationToken cancellationToken)\n at Microsoft.Azure.Cosmos.Handlers.AbstractRetryHandler.SendAsync(RequestMessage request, CancellationToken cancellationToken)\n at Microsoft.Azure.Cosmos.RequestHandler.SendAsync(RequestMessage request, CancellationToken cancellationToken)\n at Microsoft.Azure.Cosmos.Handlers.DiagnosticsHandler.SendAsync(RequestMessage request, CancellationToken cancellationToken)\n at Microsoft.Azure.Cosmos.RequestHandler.SendAsync(RequestMessage request, CancellationToken cancellationToken)\n at Team.Services.Common.CosmosDb.Metrics.CosmosDbMetricsRequestHandler.SendAsync(RequestMessage request, CancellationToken cancellationToken)\n at Microsoft.Azure.Cosmos.RequestHandler.SendAsync(RequestMessage request, CancellationToken cancellationToken)\n at Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler.SendAsync(RequestMessage request, CancellationToken cancellationToken)\n at Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler.SendAsync(String resourceUriString, ResourceType resourceType, OperationType operationType, RequestOptions requestOptions, ContainerInternal cosmosContainerCore, FeedRange feedRange, Stream streamPayload, Action1 requestEnricher, ITrace trace, CancellationToken cancellationToken)\n at Microsoft.Azure.Cosmos.ContainerCore.ProcessItemStreamAsync(Nullable1 partitionKey, String itemId, Stream streamPayload, OperationType operationType, ItemRequestOptions requestOptions, ITrace trace, CancellationToken cancellationToken)\n at Microsoft.Azure.Cosmos.ContainerCore.ExtractPartitionKeyAndProcessItemStreamAsync[T](Nullable1 partitionKey, String itemId, T item, OperationType operationType, ItemRequestOptions requestOptions, ITrace trace, CancellationToken cancellationToken)\n at Microsoft.Azure.Cosmos.ContainerCore.ReplaceItemAsync[T](T item, String id, ITrace trace, Nullable1 partitionKey, ItemRequestOptions requestOptions, CancellationToken cancellationToken)\n at Microsoft.Azure.Cosmos.ClientContextCore.RunWithDiagnosticsHelperAsync[TResult](ITrace trace, Func`2 task)”},“children”:[{“name”:“ItemSerialize”,“id”:“f1df4548-aca2-4c4c-b3d1-628be7929927”,“caller info”:{“member”:“ExtractPartitionKeyAndProcessItemStreamAsync”,“file”:“ContainerCore.Items.cs”,“line”:941},“start time”:“01:03:27:626”,“duration in milliseconds”:0.037},{“name”:“Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler”,“id”:“297adf96-e2e9-469b-8377-b9112b372d58”,“start time”:“01:03:27:626”,“duration in milliseconds”:1907.3794,“children”:[{“name”:“Team.Services.Common.CosmosDb.Metrics.CosmosDbMetricsRequestHandler”,“id”:“cde494c7-1337-410b-ab0f-d1d134bff1d0”,“start time”:“01:03:27:626”,“duration in milliseconds”:1907.2991,“children”:[{“name”:“Microsoft.Azure.Cosmos.Handlers.DiagnosticsHandler”,“id”:“fa46111b-3207-40f4-83f2-8df7f6c51000”,“start time”:“01:03:27:626”,“duration in milliseconds”:1906.8139,“children”:[{“name”:“Microsoft.Azure.Cosmos.Handlers.RetryHandler”,“id”:“ea078910-bd49-43d2-a2e5-bcf1f75eaa20”,“start time”:“01:03:27:626”,“duration in milliseconds”:1906.7529,“children”:[{“name”:“Microsoft.Azure.Cosmos.Handlers.RouterHandler”,“id”:“b97c43e4-7dd0-425b-ad9b-063d22c3f2e9”,“start time”:“01:03:27:626”,“duration in milliseconds”:1906.6648,“children”:[{“name”:“Microsoft.Azure.Cosmos.Handlers.TransportHandler”,“id”:“9eb400c9-7db3-42f0-a6ca-04b8ea09d665”,“start time”:“01:03:27:626”,“duration in milliseconds”:1906.6244,“children”:[{“name”:“Microsoft.Azure.Documents.ServerStoreModel Transport Request”,“id”:“d0dd59bc-daf8-4277-859d-a006f2c8b3fa”,“caller info”:{“member”:“ProcessMessageAsync”,“file”:“TransportHandler.cs”,“line”:109},“start time”:“01:03:27:626”,“duration in milliseconds”:1906.4729,“data”:{“Client Side Request Stats”:{“Id”:“AggregatedClientSideRequestStatistics”,“ContactedReplicas”:[{“Count”:1,<Removed for security>],“FailedReplicas”:[],“AddressResolutionStatistics”:[],“StoreResponseStatistics”:[{“ResponseTimeUTC”:“2022-02-03T01:03:27.6953683Z”,“ResourceType”:“Document”,“OperationType”:“Replace”,“LocationEndpoint”:“endpoint/”,“StoreResult”:{“ActivityId”:“82a312ab-765d-4362-abc1-257425ef55d5”,“StatusCode”:“Ok”,“SubStatusCode”:“Unknown”,“LSN”:9692316,“PartitionKeyRangeId”:“0”,“GlobalCommittedLSN”:9692310,“ItemLSN”:-1,“UsingLocalLSN”:false,“QuorumAckedLSN”:9692315,“SessionToken”:“0#9692316#7=-1”,“CurrentWriteQuorum”:3,“CurrentReplicaSetSize”:4,“NumberOfReadRegions”:1,“IsValid”:true,“StorePhysicalAddress”:“rntbd://:14068/apps/ba19ffe5-e50d-4d28-9b5d-c069d0e15f9a/services/f707bbef-ed4b-4815-82f7-ccd9e8b7f212/partitions/b691ef48-742a-4c3d-b1c5-1117beeb0a6d/replicas/132876734229942716p/”,“RequestCharge”:16,“RetryAfterInMs”:null,“BELatencyInMs”:“5.287”,“transportRequestTimeline”:{“requestTimeline”:[{“event”: “Created”, “startTimeUtc”: “2022-02-03T01:03:27.6264893Z”, “durationInMs”: 0.0027},{“event”: “ChannelAcquisitionStarted”, “startTimeUtc”: “2022-02-03T01:03:27.6264920Z”, “durationInMs”: 0.009},{“event”: “Pipelined”, “startTimeUtc”: “2022-02-03T01:03:27.6265010Z”, “durationInMs”: 0.122},{“event”: “Transit Time”, “startTimeUtc”: “2022-02-03T01:03:27.6266230Z”, “durationInMs”: 68.6575},{“event”: “Received”, “startTimeUtc”: “2022-02-03T01:03:27.6952805Z”, “durationInMs”: 0.0698},{“event”: “Completed”, “startTimeUtc”: “2022-02-03T01:03:27.6953503Z”, “durationInMs”: 0}],“requestSizeInBytes”:2902,“requestBodySizeInBytes”:851,“responseMetadataSizeInBytes”:500},“TransportException”:null}​},{“ResponseTimeUTC”:“2022-02-03T01:03:27.7595265Z”,“ResourceType”:“Collection”,“OperationType”:“Head”,“LocationEndpoint”:“https://”,“StoreResult”:{“ActivityId”:“82a312ab-765d-4362-abc1-257425ef55d5”,“StatusCode”:“NoContent”,“SubStatusCode”:“Unknown”,“LSN”:9692310,“PartitionKeyRangeId”:“0”,“GlobalCommittedLSN”:9692310,“ItemLSN”:8971340,“UsingLocalLSN”:false,“QuorumAckedLSN”:-1,“SessionToken”:“0#9692310#7=-1”,“CurrentWriteQuorum”:-1,“CurrentReplicaSetSize”:-1,“NumberOfReadRegions”:1,“IsValid”:true,“StorePhysicalAddress”:“rntbd://:/apps//services/f707bbef-ed4b-4815-82f7-ccd9e8b7f212/partitions/b691ef48-742a-4c3d-b1c5-1117beeb0a6d/replicas/132860494913145228s/”,“RequestCharge”:0,“RetryAfterInMs”:null,“BELatencyInMs”:“0.451”,“transportRequestTimeline”:{“requestTimeline”:[{“event”: “Created”, “startTimeUtc”: “2022-02-03T01:03:27.6956262Z”, “durationInMs”: 0.0027},{“event”: “ChannelAcquisitionStarted”, “startTimeUtc”: “2022-02-03T01:03:27.6956289Z”, “durationInMs”: 0.0026},{“event”: “Pipelined”, “startTimeUtc”: “2022-02-03T01:03:27.6956315Z”, “durationInMs”: 0.1074},{“event”: “Transit Time”, “startTimeUtc”: “2022-02-03T01:03:27.6957389Z”, “durationInMs”: 63.7201},{“event”: “Received”, “startTimeUtc”: “2022-02-03T01:03:27.7594590Z”, “durationInMs”: 0.0621},{“event”: “Completed”, “startTimeUtc”: “2022-02-03T01:03:27.7595211Z”, “durationInMs”: 0}],“requestSizeInBytes”:1881,“responseMetadataSizeInBytes”:298},“TransportException”:null}​},{“ResponseTimeUTC”:“2022-02-03T01:03:27.8362170Z”,“ResourceType”:“Collection”,“OperationType”:“Head”,“LocationEndpoint”:“/”,“StoreResult”:{“ActivityId”:“82a312ab-765d-4362-abc1-257425ef55d5”,“StatusCode”:“NoContent”,“SubStatusCode”:“Unknown”,“LSN”:9692316,“PartitionKeyRangeId”:“0”,“GlobalCommittedLSN”:9692310,“ItemLSN”:8971340,“UsingLocalLSN”:false,“QuorumAckedLSN”:-1,“SessionToken”:“0#9692316#7=-1”,“CurrentWriteQuorum”:-1,“CurrentReplicaSetSize”:-1,“NumberOfReadRegions”:1,“IsValid”:true,“StorePhysicalAddress”:"rntbd://c/apps/ba
</body> </html>`

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:16 (9 by maintainers)

github_iconTop GitHub Comments

2reactions
ealsurcommented, Feb 4, 2022

The CancellationToken is stopping the quorum verification for Strong/Bounded Staleness, that means that quorum, for some reason, is not being achieved within those 2 seconds. The 20 Collection HEAD requests mean that the SDK checked 20 times for quorum until the token cancelled. It would be good to investigate why is it taking that long for quorum. Increasing the token time would certainly help with mitigation.

1reaction
ealsurcommented, Feb 7, 2022

First step would be to increase the CancellationToken time and see if that impacts. Afterwards, if the issue persists, filing a support ticket should help the service team investigate what could be the reason. For the service ticket it would be ideal to have the full exception if possible (no truncation).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshoot Azure Cosmos DB HTTP 408 or request ...
APPLIES TO: NoSQL. The HTTP 408 error occurs if the SDK was unable to complete the request before the timeout limit occurred.
Read more >
Request Timeout in Azure Cosmos DB in sdk v3
I am inserting the data to azure cosmos db. In some time it throws an error (Request Timeout : 408). I have increased...
Read more >
Cosmos write timeout exception causing high latency #3798
I get operation timeout errors (408) on writing to cosmosdb using ... Throwing exception explicity as stream API only throws client side ...
Read more >
Azure Cosmos DB Troubleshoot Service Availability 503 ...
Learn how to troubleshoot service availability 503 and timeout 408 issues connectivity issues with Azure Cosmos DB.
Read more >
Azure Cosmos DB 4xx Status Codes
Request Timeout means the request did not complete in time. This can happen when a stored procedure, trigger, or UDF within a query...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found