Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Question: Session Consistency Sanity Checks

See original GitHub issue

I have some questions that I wasn’t able to find an authoritative answer from the docs regarding specific behaviors with regard to Session Consistency (via session tokens). (I realize the answers are in the code, but I’m asking here both to hopefully prompt doc enhancements and to have some assurance I’m reading the code right)

Question 1 - how do read retries manifest?

The specific section I’m reading has a heading:

Session consistency guarantees

In such cases, the SDK detects the specific failure on the read operation and retries the read on the hub region to ensure session consistency.

ASIDE: hub region seems to be a superseded term?

1a Is there a way to programmatically determine such an occurrence? 1b does it manifest as the request charge being approximately double the typical?

Question 2 - does lack of a preferred region preclude multi-region reads?

Context: I have multiple write regions, but am not nominating a preferred region. I read the table as implying that the retries the read on the hub region clause should not apply. But I’d like some clarifications:

2a Can I assume the fact I dont set a preferred region means all reads stay in one region? (even under 429s etc?) 2b In this scenario (2 regions, multi-master, no region preference) can the CosmosClient’s internal SessionToken management result in a retry roundtrip with two sets of charges accumulating arising?)

Question 3 - for SDK >= 3.0.9, can I assume that Session Tokens in responses get updated into the CosmosClient for all actions: Read, Write, Execute Stored Proc ?

3a If I write and read on same session synchronously, will the session token from the write get saved to the clients tokens and propagate to a read operation I do on the same client directly after? (or do I need to supply it in the options for that read-my-writes requirement to be triggered?)

3b If the write took place in a stored proc, can I assume the token from the response propagates to the session and gets applied e.g. to a subsequent Read operation I do on the same Container?

Question 4 - In conclusion, is touching the SessionToken only ever needed to be able to manage chaining of independent ‘external requests’?

A different way of expressing the ask: "Can I assume that touching the SessionToken and/or putting it into the options for a given request is only ever needed in the following scenarios:

when a writer operation has concluded, you can pass along the Session Token to the caller in order that they can pass it on to enable a follow-on request to guarantee reading of those writes
when the processing of an operation needs to see the writes from a prior operation [that has not necessarily taken place on this Client], the first operation (read, write, etc) that I am carrying out should have the SessionToken supplied in the Options for that first request
any subsequent operations can be expected to chain from this token onwards (i.e. like in question 3a, once you feed in the session token, the client should chain the session token through any sequence of read or write operations)

Question 5 - Is there an example and/or a pattern for having the handler dealing with a set of documents from the change feed be able to extract the token?

In some cases, I’ve observed e.g. 404s when reading a document that the changefeed just told me had been inserted.

For example, if I have a CFP watching creates or updates, is there a way to obtain the SessionToken in force as the changefeed was read, such that I can then read a series of documents and be assured I’m going to see the same (or newer) versions from the Container from which the change feed item emanated?

Issue Analytics

State:
Created 3 years ago
Comments:7 (7 by maintainers)

Top GitHub Comments

1reaction

bartelinkcommented, Oct 20, 2020

Wow, didn’t realize you wrote that too 😁 👏 I can’t imagine having been able to piece the full set of expectations together without that guide knitting it all together (it’s linked from one of the other docs, which is how I landed on it; I suspect an extra link or two might be justified)

Thanks once again for all the help. (The happy conclusion of all this is that I was able to conclude that, as usual, SELECT wasn’t broken and PEBKAC applied as usual; my test suite remains green under continual running)

1reaction

ealsurcommented, Oct 19, 2020

I’ll try to cover the answers with what I know.

Question 1: The backend returns a HTTP 404 with substatus 1002 (Read Session not available). The hub or primary region is the first region in your account region list (when you go to the Portal for example, the first one in the list).

When the retry happens, you can see it in the Diagnostics. As per RU cost, since the first operation was a 404, I’m not sure how much is the RU affected, 404s might have some RU cost but I don’t know (wouldn’t expect it) to be the same as actually finding the item.

Question 2: The only retry that is NOT done when you are not populating preferred regions is the the “Transient connectivity on TCP protocol”, and that article clearly states so. The other cross-region retries will go to other regions, using either your order or preference or the order defined in the account region list. Session consistency retry will always retry in the hub/primary region first.

Particularly for, 2.a - No, you can have a cross-region retry on a Read if you face a 404/1002. You can always turn off all cross-regional retries by flipping the CosmosClientOptions.EnableEndpointRediscovery but I would advice against it because if there is a regional failure, the SDK won’t react, and you will start to fail (unless that’s what you want).

Question 3: I don’t believe there was any change in that behavior. What you describe on 3.a/3.b is what happens in the SDK.

Question 4: That is what is described in the consistency docs: https://docs.microsoft.com/en-us/azure/cosmos-db/consistency-levels#guarantees-associated-with-consistency-levels

The SDK instance maintains the session scope. If you need to extend the session scope to other clients, then you need to pass the SessionToken to those other instances. Otherwise, you don’t need to read and pass it to the same instance.

Question 5: Normally this happens when the read you do after is on a different client instance, in which case, you need to obtain the SessionToken from the Change Feed read (for the Push Processor, I’m trying to add this along with manual checkpoint support). See: https://github.com/Azure/azure-cosmos-dotnet-v3/issues/1765