Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Lease token was taken over by owner exception

See original GitHub issue

Some of our change feed processors are stuck and we’re seeing some OperationCanceledException as well as these types of CosmosExceptions in the logs:

Response status code does not indicate success: PreconditionFailed (412); Substatus: 0; ActivityId: ; Reason: (796 lease token was taken over by owner something-6c082e98-54b1-4fe9-9486-fc51ce2be403

What does it mean for a lease token to be taken by another owner? I was under the impression that a single lease is owned by a single compute instance. The change feed processors also seem to start running again from time to time and then halt again seemingly randomly.

Our configuration involves:

A single monitored container
Multiple change feed processors doing different things
All of them use the same lease configuration
Multiple instances of the processors on multiple hosts where every instance name is postfixed by a guid so it has a unique name

Issue Analytics

State:
Created 6 months ago
Comments:18 (11 by maintainers)

Top GitHub Comments

1reaction

davidtimovskicommented, Apr 12, 2023

Here’s what I found. The issue is indeed on our side and it occurred when I was migrating to Cosmos DB v3. I’ll describe it here for future reference.

We have manual checkpointing logic with a lot of other things built on top of change feed processors. In our v2 code this is how things would play out during an unhandled exception:

The observer that processes the change captures the exception in a private field. If that observer’s ProcessChangesAsync method gets called again and this field is set, it logs that it’s in a faulted state and throws it again.
When this exception is thrown the lease is released. The observer is no longer invoked.
The lease is eventually acquired by another instance and a new observer gets instantiated by the factory
Everything continues working

When an exception occurs in our initial v3 code:

Same thing as before although the private field is inside a class that contains our delegate that processes changes
When this exception is thrown the lease is released
After the lease renewal interval this same delegate is invoked again to process something but due to the exception field still being set, it logs that it’s in a faulted state and throws it again
Go to 1

Now any instance that had an exception would halt and perpetually retry acquiring a lease and attempting to process. Ergo infinite “Lease was taken over by owner” logs.

So for us what used to be instances of IChangeFeedObserver that get dumped every time they have an unhandled exception (v2), were now delegates that continued to be reused in a perpetual faulted state (v3).

I just removed the parts about keeping the exception in a private field.

Thank you for all the help! I wouldn’t have been able to find my mistake without it.

You can close this issue.

1reaction

davidtimovskicommented, Apr 5, 2023

@ealsur We’re investigating this but for the time being I do not believe it to be related to Cosmos DB v3. We have a bunch of things built on top of the change feed processor SDK and some of them like the batching is probably the crux of it. Soon as I get more understanding of what’s happening I’ll close this issue 😃