question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItĀ collects links to all the places you might be looking at while hunting down a tough bug.

And, if youā€™re still stuck at the end, weā€™re happy to hop on a call to see how we can help out.

Question: Session Consistency Sanity Checks

See original GitHub issue

I have some questions that I wasnā€™t able to find an authoritative answer from the docs regarding specific behaviors with regard to Session Consistency (via session tokens). (I realize the answers are in the code, but Iā€™m asking here both to hopefully prompt doc enhancements and to have some assurance Iā€™m reading the code right)


Question 1 - how do read retries manifest?

The specific section Iā€™m reading has a heading:

Session consistency guarantees

In such cases, the SDK detects the specific failure on the read operation and retries the read on the hub region to ensure session consistency.

ASIDE: hub region seems to be a superseded term?

1a Is there a way to programmatically determine such an occurrence? 1b does it manifest as the request charge being approximately double the typical?


Question 2 - does lack of a preferred region preclude multi-region reads?

Context: I have multiple write regions, but am not nominating a preferred region. I read the table as implying that the retries the read on the hub region clause should not apply. But Iā€™d like some clarifications:

2a Can I assume the fact I dont set a preferred region means all reads stay in one region? (even under 429s etc?) 2b In this scenario (2 regions, multi-master, no region preference) can the CosmosClientā€™s internal SessionToken management result in a retry roundtrip with two sets of charges accumulating arising?)


Question 3 - for SDK >= 3.0.9, can I assume that Session Tokens in responses get updated into the CosmosClient for all actions: Read, Write, Execute Stored Proc ?

3a If I write and read on same session synchronously, will the session token from the write get saved to the clients tokens and propagate to a read operation I do on the same client directly after? (or do I need to supply it in the options for that read-my-writes requirement to be triggered?)

3b If the write took place in a stored proc, can I assume the token from the response propagates to the session and gets applied e.g. to a subsequent Read operation I do on the same Container?


Question 4 - In conclusion, is touching the SessionToken only ever needed to be able to manage chaining of independent ā€˜external requestsā€™?

A different way of expressing the ask: "Can I assume that touching the SessionToken and/or putting it into the options for a given request is only ever needed in the following scenarios:

  • when a writer operation has concluded, you can pass along the Session Token to the caller in order that they can pass it on to enable a follow-on request to guarantee reading of those writes
  • when the processing of an operation needs to see the writes from a prior operation [that has not necessarily taken place on this Client], the first operation (read, write, etc) that I am carrying out should have the SessionToken supplied in the Options for that first request
  • any subsequent operations can be expected to chain from this token onwards (i.e. like in question 3a, once you feed in the session token, the client should chain the session token through any sequence of read or write operations)

Question 5 - Is there an example and/or a pattern for having the handler dealing with a set of documents from the change feed be able to extract the token?

In some cases, Iā€™ve observed e.g. 404s when reading a document that the changefeed just told me had been inserted.

For example, if I have a CFP watching creates or updates, is there a way to obtain the SessionToken in force as the changefeed was read, such that I can then read a series of documents and be assured Iā€™m going to see the same (or newer) versions from the Container from which the change feed item emanated?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
bartelinkcommented, Oct 20, 2020

Wow, didnā€™t realize you wrote that too šŸ˜ šŸ‘ I canā€™t imagine having been able to piece the full set of expectations together without that guide knitting it all together (itā€™s linked from one of the other docs, which is how I landed on it; I suspect an extra link or two might be justified)

Thanks once again for all the help. (The happy conclusion of all this is that I was able to conclude that, as usual, SELECT wasnā€™t broken and PEBKAC applied as usual; my test suite remains green under continual running)

1reaction
ealsurcommented, Oct 19, 2020

Iā€™ll try to cover the answers with what I know.

Question 1: The backend returns a HTTP 404 with substatus 1002 (Read Session not available). The hub or primary region is the first region in your account region list (when you go to the Portal for example, the first one in the list).

When the retry happens, you can see it in the Diagnostics. As per RU cost, since the first operation was a 404, Iā€™m not sure how much is the RU affected, 404s might have some RU cost but I donā€™t know (wouldnā€™t expect it) to be the same as actually finding the item.

Question 2: The only retry that is NOT done when you are not populating preferred regions is the the ā€œTransient connectivity on TCP protocolā€, and that article clearly states so. The other cross-region retries will go to other regions, using either your order or preference or the order defined in the account region list. Session consistency retry will always retry in the hub/primary region first.

image

Particularly for, 2.a - No, you can have a cross-region retry on a Read if you face a 404/1002. You can always turn off all cross-regional retries by flipping the CosmosClientOptions.EnableEndpointRediscovery but I would advice against it because if there is a regional failure, the SDK wonā€™t react, and you will start to fail (unless thatā€™s what you want).

Question 3: I donā€™t believe there was any change in that behavior. What you describe on 3.a/3.b is what happens in the SDK.

Question 4: That is what is described in the consistency docs: https://docs.microsoft.com/en-us/azure/cosmos-db/consistency-levels#guarantees-associated-with-consistency-levels

image

The SDK instance maintains the session scope. If you need to extend the session scope to other clients, then you need to pass the SessionToken to those other instances. Otherwise, you donā€™t need to read and pass it to the same instance.

Question 5: Normally this happens when the read you do after is on a different client instance, in which case, you need to obtain the SessionToken from the Change Feed read (for the Push Processor, Iā€™m trying to add this along with manual checkpoint support). See: https://github.com/Azure/azure-cosmos-dotnet-v3/issues/1765

Read more comments on GitHub >

github_iconTop Results From Across the Web

How many sanity checks and/or potentially fatal situations ...
Is there a limit? I guess it all depends whether you want them to progress to another scenario!
Read more >
Consistency level choices - Azure Cosmos DB
Azure Cosmos DB has five consistency levels to help balance eventual consistency, availability, and latency trade-offs.
Read more >
Medium header sanity check errors - Data Protector
Cause. This issue occurs due to header consistency errors on the medium. Solution. Export the medium from the IDB and restart the failed...
Read more >
Network Analysis 2. Analysing Network Configuration ...
Analysing Network Configuration Consistency (Sanity check, BGP, Routes) with Batfish for Cisco, Arista, and Cumulus.
Read more >
What Is Sanity Testing? [with Examples]
Introduction. In this article, we will explore the nature of sanity testing in software development and maintenance projects.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found