question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

feat(ChangeFeedProcessor): Programmatically delete checlpoint and trigger retraversal

See original GitHub issue

Is your feature request related to a problem? Please describe.

As detailed in https://github.com/Azure/azure-documentdb-changefeedprocessor-dotnet/issues/123 there’s no programmatic way provided in the API to force a restart of a CFP projection’s lease data

Describe the solution you’d like

V1 used to expose a delete leases API - that would work work for use cases I’m aware of.

Describe alternatives you’ve considered

Without a programmatic interface like this, one is reduced to interactively messing with lease documents, and/or writing code that couples to implementation details such as the id and/or Partition Key associated with a lease.

The only other workaround is to mint a new lease id and supply the desired arguments as that’s created. This is hugely problematic when running multiple leases and/or having multiple containers being projected (i.e. I can’t/dont want to be maintaining some mapping in Consul, git or anything else that says that for container 3 we’re using the default2 projection because we wanted to reset it)

Additional context

https://github.com/Azure/azure-documentdb-changefeedprocessor-dotnet/issues/123

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:1
  • Comments:12 (10 by maintainers)

github_iconTop GitHub Comments

3reactions
bartelinkcommented, May 6, 2021

Yes, from my perspective there continues to be the same need to be able to ask the Change Feed Processor system to purge its state.

My reasoning for this is that:

  • CFP logic in both CFP2 and Microsoft.Azure.Cosmos owns the writing, naming and the contract of the leases+checkpoints - it’s a black box
  • therefore that system should provide a mechanism to clean up its state

The only real workarounds I am aware of are:

  • have one set of leases per aux container; whack it and start again (but that wastes capacity and is not compatible with usages where there are an interesting number of CFPs running against a given monitored container)
  • understand the naming and format and go hacking in there (and programmatic equivalents of that then become version Cosmos SDK version dependent)
  • keep coining new version-sufficed editions of the LeaseId (aka consumer group name) and/or generate ephemeral ones each time (but that leaves dead state and can be problematic)

So yes, a basic API to delete all leases and checkpoints would be very welcome indeed. The specific place I’d make use of it is in this dotnet tool

  • it presently has a propulsion init cosmos -c container feature to generate a fresh Lease Container
  • I would add a propulsion destroy-leases --leaseId=MyLease cosmos -c container that would call this feature

This would allow one to replace existing workflows where test rigs generate ephemeral lease ids and do lots of juggling to make that work.

1reaction
ealsurcommented, Dec 13, 2022

If the lease is deleted while being processed, then the checkpoint will fail. The checkpoint is a Replace operation, so it will fail with a 404. This causes the running Task to stop, the lease would then be attempted to be released, which would again fail (Replace => 404), the SDK will log the error through the Notification APIs and stop the Task. After some time (Acquire time), the lease container would get scanned to see if any leases are up for taking, it will see none. Eventually, the processor will hit 404 on all leases, and eventually release them, but it is not a deterministic process, you cannot tell when the whole process will complete.

Having an API that deletes all leases does not guarantee that after the method completes, the running processors, if any, are reset. It requires coordination of instances.

Read more comments on GitHub >

github_iconTop Results From Across the Web

No results found

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found