question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Response status code does not indicate success: RequestTimeout (408)

See original GitHub issue

Similar issue to this issue but in our case seems like CPU ramps up to 97% but I can’t understand why.

Our Cosmos DB is set to auto scale and we haven’t crossed 50% of the max RU consumption in the last 7 days.

The update is requested from Azure Function v4 (linux, net6.0, isolated process) on Premium Plan.

I followed this document: https://docs.microsoft.com/en-us/azure/cosmos-db/sql/troubleshoot-dot-net-sdk-request-timeout?tabs=cpu-new#high-cpu-utilization

and cross checked all points:

  1. All SNAT connections were successful (latest 24h)
  2. We use our CosmosContext that inherits from DbContext
services.AddDbContext<CosmosContext>(options =>
{
	options.UseCosmos(configuration[AppSettingsKeys.CosmosDbConnection], "somenamehere");
});

which internally creates a singleton of CosmosClient

  1. We are nowhere near the service limits

  2. There is no HTTP proxy

This happens in a particular function that pull the document and updates nested properties. The document is around 40KB. The function has service bus trigger and retry policy:

	"retry": {
		"strategy": "exponentialBackoff",
		"maxRetryCount": 3,
		"minimumInterval": "00:00:03",
		"maximumInterval": "00:00:10"
	},

I have no idea what’s going on.

Here is the diagnostics registered in the exception details:

"Diagnostics":{
    "name":"ReplaceItemStreamAsync",
    "id":"d4330cac-9cd4-4fb9-ac70-26a0942b96a6",
    "caller info":{
       "member":"OperationHelperWithRootTraceAsync",
       "file":"ClientContextCore.cs",
       "line":244
    },
    "start time":"10:45:08:241",
    "duration in milliseconds":12210.9945,
    "data":{
       "Client Configuration":{
          "Client Created Time Utc":"2022-06-10T11:56:21.5647195Z",
          "NumberOfClientsCreated":2,
          "User Agent":"cosmos-netstandard-sdk/3.21.0|3.21.1|2|X64|Linux 5.4.0-1074-azure 77 18.|.NET 6.0.5|N| Microsoft.EntityFrameworkCore.Cosmos/6.0.5",
          "ConnectionConfig":{
             "gw":"(cps:50, urto:10, p:False, httpf: False)",
             "rntbd":"(cto: 5, icto: -1, mrpc: 30, mcpe: 65535, erd: True, pr: ReuseUnicastPort)",
             "other":"(ed:False, be:False)"
          },
          "ConsistencyConfig":"(consistency: NotSet, prgns:[])"
       }
    },
    "children":[
       {
          "name":"Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler",
          "id":"3892e9c8-a327-4ae2-a1b4-4b30b552721c",
          "start time":"10:45:08:241",
          "duration in milliseconds":12210.9644,
          "children":[
             {
                "name":"Microsoft.Azure.Cosmos.Handlers.DiagnosticsHandler",
                "id":"042f8751-3514-46ac-bd3b-e51ff061ac70",
                "start time":"10:45:08:241",
                "duration in milliseconds":12210.932,
                "data":{
                   "System Info":{
                      "systemHistory":[
                         {
                            "dateUtc":"2022-06-14T10:44:12.4755898Z",
                            "cpu":9.907,
                            "memory":3178468.000,
                            "threadInfo":{
                               "isThreadStarving":"False",
                               "threadWaitIntervalInMs":0.0213,
                               "availableThreads":32766,
                               "minThreads":2,
                               "maxThreads":32767
                            }
                         },
                         {
                            "dateUtc":"2022-06-14T10:44:22.4788493Z",
                            "cpu":4.343,
                            "memory":3178484.000,
                            "threadInfo":{
                               "isThreadStarving":"False",
                               "threadWaitIntervalInMs":0.0088,
                               "availableThreads":32766,
                               "minThreads":2,
                               "maxThreads":32767
                            }
                         },
                         {
                            "dateUtc":"2022-06-14T10:44:39.0703495Z",
                            "cpu":79.250,
                            "memory":3484276.000,
                            "threadInfo":{
                               "isThreadStarving":"False",
                               "threadWaitIntervalInMs":0.209,
                               "availableThreads":32756,
                               "minThreads":2,
                               "maxThreads":32767
                            }
                         },
                         {
                            "dateUtc":"2022-06-14T10:44:51.4720374Z",
                            "cpu":79.208,
                            "memory":2110288.000,
                            "threadInfo":{
                               "isThreadStarving":"False",
                               "threadWaitIntervalInMs":6.154,
                               "availableThreads":32737,
                               "minThreads":2,
                               "maxThreads":32767
                            }
                         },
                         {
                            "dateUtc":"2022-06-14T10:45:01.5421178Z",
                            "cpu":82.129,
                            "memory":959112.000,
                            "threadInfo":{
                               "isThreadStarving":"False",
                               "threadWaitIntervalInMs":0.3395,
                               "availableThreads":32732,
                               "minThreads":2,
                               "maxThreads":32767
                            }
                         },
                         {
                            "dateUtc":"2022-06-14T10:45:20.1404512Z",
                            "cpu":97.987,
                            "memory":1891392.000,
                            "threadInfo":{
                               "isThreadStarving":"False",
                               "threadWaitIntervalInMs":1.2721,
                               "availableThreads":32730,
                               "minThreads":2,
                               "maxThreads":32767
                            }
                         }
                      ]
                   }
                },
                "children":[
                   {
                      "name":"Microsoft.Azure.Cosmos.Handlers.RetryHandler",
                      "id":"b87c1d09-2c23-470f-988e-70558cfcdcb5",
                      "start time":"10:45:08:241",
                      "duration in milliseconds":12210.9261,
                      "children":[
                         {
                            "name":"Microsoft.Azure.Cosmos.Handlers.RouterHandler",
                            "id":"b513f900-a379-4bfe-b5f3-9d52d15398ff",
                            "start time":"10:45:08:241",
                            "duration in milliseconds":12210.7416,
                            "children":[
                               {
                                  "name":"Microsoft.Azure.Cosmos.Handlers.TransportHandler",
                                  "id":"27ab336b-34d4-405d-9534-ab79980d0b29",
                                  "start time":"10:45:08:241",
                                  "duration in milliseconds":12210.6676,
                                  "children":[
                                     {
                                        "name":"Microsoft.Azure.Documents.ServerStoreModel Transport Request",
                                        "id":"ee060395-4562-4b8c-a6b8-c24daf7d3e45",
                                        "caller info":{
                                           "member":"ProcessMessageAsync",
                                           "file":"TransportHandler.cs",
                                           "line":109
                                        },
                                        "start time":"10:45:08:241",
                                        "duration in milliseconds":12169.0857,
                                        "data":{
                                           "Client Side Request Stats":{
                                              "Id":"AggregatedClientSideRequestStatistics",
                                              "ContactedReplicas":[
                                                 {
                                                    "Count":1,
                                                    "Uri":""
                                                 },
                                                 {
                                                    "Count":1,
                                                    "Uri":""
                                                 },
                                                 {
                                                    "Count":1,
                                                    "Uri":""
                                                 }
                                              ],
                                              "RegionsContacted":[
                                                 
                                              ],
                                              "FailedReplicas":[
                                                 
                                              ],
                                              "AddressResolutionStatistics":[
                                                 
                                              ],
                                              "StoreResponseStatistics":[
                                                 
                                              ]
                                           }
                                        }
                                     }
                                  ]
                               }
                            ]
                         }
                      ]
                   }
                ]
             }
          ]
       }
    ]
 }

Additionally I get “ghost updates”:

product.UpdateStock(5);
await _cosmosContext.SaveChangesAsync(CancellationToken);

_logger.Information("Stock Update {@Request}", new
{
   product.StockQuantity,
});

The log tells me it has updated the document: product.StockQuantity = 5 but querying the actual document reveals it is still set with the value from the previous update: product.StockQuantity = 0.

No exception is thrown related to this particular update.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
ealsurcommented, Jun 16, 2022
"NumberOfClientsCreated":2,
"NumberOfActiveClients":2,

There are still 2 clients being created and active, is this what you expect?

"TransportException":"A client transport error occurred: The request timed out while waiting for a server response. (Time: 2022-06-15T15:28:26.8329201Z, activity ID: c183650a-ebb4-4909-8232-6409a240844d, 
error code: ReceiveTimeout [0x0010], base error: HRESULT 0x80131500, URI: rntbd://cdb-ms-prod-uaenorth1-fd3.documents.azure.com:14068/apps/b7085295-6925-4496-875f-71e670ae930b/services/0b3be168-340e-4390-9048-5330613d4e4a/partitions/e834bdb7-4ae4-4818-876c-76106624da5f/replicas/132923450053698351p/, connection: 169.254.129.3:40414 -> 40.120.74.64:14068, payload sent: True)"

This is a timeout, there are 2 potential issues:

{
   "event":"Transit Time",
   "startTimeUtc":"2022-06-15T15:28:16.6202234Z",
   "durationInMs":2543.9462
},
{
   "event":"Received",
   "startTimeUtc":"2022-06-15T15:28:19.1641696Z",
   "durationInMs":7668.9161
},

You have high Transit Time, meaning, something is not entirely right in the network (2 seconds for a request is massive).

Very high time on Received: This means the response is sitting there waiting ~8 seconds to be consumed. This points to thread pool issues. I/O response is an async operation, this is the time before the async Task is processed, meaning that the thread-pool cannot assign a thread to continue that async Task for 8 seconds. This usually points at code in the app blocking threads (https://docs.microsoft.com/en-us/azure/cosmos-db/sql/troubleshoot-dot-net-sdk-slow-request?tabs=cpu-new#rntbdRequestStats), meaning that some code might not following await/async and using .Result/GetAwaiter().GetResult()/etc that might be blocking threads and preventing those threads from being used by the thread pool to resume async operations. This can also lead to high CPU usage. Useful guide: https://github.com/davidfowl/AspNetCoreDiagnosticScenarios/blob/master/AsyncGuidance.md#avoid-using-taskresult-and-taskwait

CPU values in Linux are obtained from /proc/stat/cpu, it’s the system wide CPU. I don’t know what those metrics in the Portal read.

Transient timeouts can happen and the app should have some way to handle them: https://docs.microsoft.com/en-us/azure/cosmos-db/sql/conceptual-resilient-sdk-applications#timeouts-and-connectivity-related-failures-http-408503

It’s when the volume affects P99 that you should investigate: https://docs.microsoft.com/en-us/azure/cosmos-db/sql/conceptual-resilient-sdk-applications#when-to-contact-customer-support

Reference: https://docs.microsoft.com/en-us/azure/cosmos-db/sql/troubleshoot-dot-net-sdk-request-timeout?tabs=cpu-new#troubleshooting-steps

1reaction
ealsurcommented, Jun 14, 2022

Please update the SDK to a newer version and share the updated diagnostics. The version you are using does not include diagnostics for timeouts (added on 3.24 https://github.com/Azure/azure-cosmos-dotnet-v3/blob/master/changelog.md#-3240---2022-01-31).

The only thing we can see is that there seems to be 2 clients: "NumberOfClientsCreated":2,

We cannot tell you why your CPU is high, CPU analysis needs to be performed on the running machine.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshoot Azure Cosmos DB HTTP 408 or request ...
The HTTP 408 error occurs if the SDK was unable to complete the request before the timeout limit occurred. It is important to...
Read more >
408 status code from Cosmos DB using SDK v3
Originally we used DirectMode with different error "Response status code does not indicate success: 503 Substatus: 0 Reason: (Microsoft.Azure.
Read more >
How To Fix the HTTP 408 Error (8 Solutions)
HTTP 408 status code communicates that the server did not receive a timely response from the client. Learn multiple ways to fix this...
Read more >
408 Request Timeout - HTTP - MDN Web Docs
The HyperText Transfer Protocol (HTTP) 408 Request Timeout response status code means that the server would like to shut down this unused ...
Read more >
408 Request Timeout: What It Is and How to Fix It
A 408 Request Timeout response code indicates that the server did not receive a complete request from the client within a specific period...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found