Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CosmosDB trigger loses document if function app crashes

See original GitHub issue

If the function app crashes (is stopped/killed) during processing of a document/batch (using CosmosDB trigger change feed), the trigger will continue with the next document/batch once the app is up again, effectively losing that batch.

For mission critical operations, this is not really acceptable considering there just isn’t a way to handle this situation.

Repro steps

Start a function with the cosmos trigger
Update a document in cosmos
End the function process (kill it) while it’s processing the event
Start it again
Notice that it does not retry the same document again.

Expected behavior

The trigger should send the same document again until the function returns.

A good middle ground is making it work the way the Event Hub trigger handles it (Event Hub trigger retries the same event only if the app does not respond at all).

Without knowing the details, it seems to me that the trigger should only update the checkpoint after the function has completed, not before (which it looks like to me).

Actual behavior

The function does not receive the same document event again. Rather, the next document change is triggered. So the document change is lost.

Known workarounds

It seems if you add a retry policy like [FixedDelayRetry] to your function, the checkpoint is kept properly. This works if you ctrl-C the app at least.

Related information

microsoft.azure.webjobs.extensions.cosmosdb\3.0.10

Issue Analytics

State:
Created 2 years ago
Reactions:3
Comments:13 (5 by maintainers)

Top GitHub Comments

1reaction

CodeMonkey321commented, Jul 15, 2021

@ealsur your are right about Cosmos and deletes, I forget since we always use soft-delete with TTL when we need to record deletes in the ChangeFeed.

Having to call the management API just to disable a function, I think defeats the purpose of Functions and Triggers being easy and fast to develop and use. I’m not saying it can’t be done but its a lot of hoops to go through for something that should be build in.

The Circuit breaker pattern is old and it should have been implemented into the Function runtime if you ask me, at least as a optional configuration. Now it seems the team that writes the Triggers have to balance protecting unwitting developers from runaway cost on consumption plans vs delivering resilient and easy to use features.

0reactions

jesperkristensencommented, Jul 22, 2021

I don’t think it is possible to code custom logic that makes a function disable itself via the management API, because such logic will have to sit in the outermost try-catch to be robust. That try-catch is in the Functions runtime, so it would need to be implemented in the runtime itself.