question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Lease token was taken over by owner exception

See original GitHub issue

Some of our change feed processors are stuck and we’re seeing some OperationCanceledException as well as these types of CosmosExceptions in the logs:

Response status code does not indicate success: PreconditionFailed (412); Substatus: 0; ActivityId: ; Reason: (796 lease token was taken over by owner something-6c082e98-54b1-4fe9-9486-fc51ce2be403

What does it mean for a lease token to be taken by another owner? I was under the impression that a single lease is owned by a single compute instance. The change feed processors also seem to start running again from time to time and then halt again seemingly randomly.

Our configuration involves:

  1. A single monitored container
  2. Multiple change feed processors doing different things
  3. All of them use the same lease configuration
  4. Multiple instances of the processors on multiple hosts where every instance name is postfixed by a guid so it has a unique name

Issue Analytics

  • State:closed
  • Created 6 months ago
  • Comments:18 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
davidtimovskicommented, Apr 12, 2023

Here’s what I found. The issue is indeed on our side and it occurred when I was migrating to Cosmos DB v3. I’ll describe it here for future reference.

We have manual checkpointing logic with a lot of other things built on top of change feed processors. In our v2 code this is how things would play out during an unhandled exception:

  1. The observer that processes the change captures the exception in a private field. If that observer’s ProcessChangesAsync method gets called again and this field is set, it logs that it’s in a faulted state and throws it again.
  2. When this exception is thrown the lease is released. The observer is no longer invoked.
  3. The lease is eventually acquired by another instance and a new observer gets instantiated by the factory
  4. Everything continues working

When an exception occurs in our initial v3 code:

  1. Same thing as before although the private field is inside a class that contains our delegate that processes changes
  2. When this exception is thrown the lease is released
  3. After the lease renewal interval this same delegate is invoked again to process something but due to the exception field still being set, it logs that it’s in a faulted state and throws it again
  4. Go to 1

Now any instance that had an exception would halt and perpetually retry acquiring a lease and attempting to process. Ergo infinite “Lease was taken over by owner” logs.

So for us what used to be instances of IChangeFeedObserver that get dumped every time they have an unhandled exception (v2), were now delegates that continued to be reused in a perpetual faulted state (v3).

I just removed the parts about keeping the exception in a private field.

Thank you for all the help! I wouldn’t have been able to find my mistake without it.

You can close this issue.

1reaction
davidtimovskicommented, Apr 5, 2023

@ealsur We’re investigating this but for the time being I do not believe it to be related to Cosmos DB v3. We have a bunch of things built on top of the change feed processor SDK and some of them like the batching is probably the crux of it. Soon as I get more understanding of what’s happening I’ll close this issue 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

Azure cosmos changefeed Processor options
1 Answer. Leases when not renewed are not removed by the current instance. Other instances can "think" that the lease was not renewed...
Read more >
Change feed processor in Azure Cosmos DB
The lease container: The lease container acts as state storage and coordinates processing the change feed across multiple workers. The lease ...
Read more >
Lease, Renew, and Revoke | Vault
When a token is revoked, Vault will revoke all leases that were created using that token. Note: The Key/Value Backend which stores arbitrary...
Read more >
SCHD
SCHD0003E: An error occured while starting the scheduler service: {0}. ... SCHD0061E: The task information for task ID {0} and owner token {1}...
Read more >
This Shopping Center Lease Agreement (the
Landlord on behalf of and as agent for the owner of the Shopping Center hereby leases to Tenant and Tenant leases and accepts...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found