Error "The read/write session is not available" in Cosmos JS SDK v3
See original GitHub issue- Package Name:
@azure/cosmos
- Package Version: “3.7.4”
Describe the bug We are updating from v2 to v3 progressively and we started getting an error that we’ve never seen before: “The read/write session is not available”. Here is a sample with some data from the response:
body: {
code: NotFound,
message:
Message: {"Errors":["The read\/write session is not available."]}
ActivityId: , Request URI: , RequestStats:
RequestStartTime: 2020-07-22T23:14:19.0498991Z, RequestEndTime: 2020-07-22T23:14:19.0498991Z, Number of regions attempted:1
ResponseTime: 2020-07-22T23:14:19.0498991Z, StoreResult: StorePhysicalAddress: rntbd:, PartitionKeyRangeId: , IsValid: True, StatusCode: 404, SubStatusCode: 1002, RequestCharge: 0, ItemLSN: -1, SessionToken:, UsingLocalLSN: True, TransportException: null, ResourceType: Document, OperationType: Query
ResponseTime: 2020-07-22T23:14:19.0498991Z, StoreResult: StorePhysicalAddress: rntbd:, PartitionKeyRangeId: , IsValid: True, StatusCode: 404, SubStatusCode: 1002, RequestCharge: 0, ItemLSN: -1, SessionToken:, UsingLocalLSN: True, TransportException: null, ResourceType: Document, OperationType: Query
, SDK: Microsoft.Azure.Documents.Common/2.11.0
},
code: 404,
headers: {
content-type: application/json,
date: Wed, 22 Jul 2020 23:14:18 GMT,
server: Microsoft-HTTPAPI/2.0,
strict-transport-security: max-age=31536000,
transfer-encoding: chunked,
x-ms-activity-id: ,
x-ms-cosmos-llsn: 141581657,
x-ms-gatewayversion: version=2.11.0,
x-ms-global-committed-lsn: 141581652,
x-ms-last-state-change-utc: ,
x-ms-number-of-read-regions: 1,
x-ms-request-charge: 0,
x-ms-schemaversion: 1.9,
x-ms-serviceversion: version=2.11.0.0,
x-ms-session-token: ,
x-ms-substatus: 1002,
x-ms-throttle-retry-count: 0,
x-ms-throttle-retry-wait-time-ms: 0,
x-ms-transport-request-id: 25431,
x-ms-xp-role: 2
}
To Reproduce There is no direct way to reproduce, we get this error a few times a day in a very big collection.
Expected behavior No error or at least we expect the error to have clear indications of why it happens so we can prevent it.
Additional context
We have also noticed very common Timeout errors having a requestTimeout
of 30 seconds. This wasn’t happening before because we were retrying on ECONNRESET but now the error code changed and we are not retrying.
This is very concerning as it was hiding requests taking more than 30 seconds. I’m not sure what should we do about this, it seems very odd getting such timeouts so often. Maybe this other error happens for the same reason. It could be that the code changed and we are not retrying anymore.
What should we do about those? We can’t confidently migrate to v3 until we have a way to understand and tackle this.
Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (6 by maintainers)
Top GitHub Comments
We are seeing exactly same issue here.
For any followers we did uncover a bug in the retry code that could cause this error. It was fixed and released in 3.13.1 https://github.com/Azure/azure-sdk-for-js/pull/17034