question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CosmosDB trigger loses document if function app crashes

See original GitHub issue

If the function app crashes (is stopped/killed) during processing of a document/batch (using CosmosDB trigger change feed), the trigger will continue with the next document/batch once the app is up again, effectively losing that batch.

For mission critical operations, this is not really acceptable considering there just isn’t a way to handle this situation.

Repro steps

  1. Start a function with the cosmos trigger
  2. Update a document in cosmos
  3. End the function process (kill it) while it’s processing the event
  4. Start it again
  5. Notice that it does not retry the same document again.

Expected behavior

The trigger should send the same document again until the function returns.

A good middle ground is making it work the way the Event Hub trigger handles it (Event Hub trigger retries the same event only if the app does not respond at all).

Without knowing the details, it seems to me that the trigger should only update the checkpoint after the function has completed, not before (which it looks like to me).

Actual behavior

The function does not receive the same document event again. Rather, the next document change is triggered. So the document change is lost.

Known workarounds

It seems if you add a retry policy like [FixedDelayRetry] to your function, the checkpoint is kept properly. This works if you ctrl-C the app at least.

Related information

microsoft.azure.webjobs.extensions.cosmosdb\3.0.10

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:3
  • Comments:13 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
CodeMonkey321commented, Jul 15, 2021

@ealsur your are right about Cosmos and deletes, I forget since we always use soft-delete with TTL when we need to record deletes in the ChangeFeed.

Having to call the management API just to disable a function, I think defeats the purpose of Functions and Triggers being easy and fast to develop and use. I’m not saying it can’t be done but its a lot of hoops to go through for something that should be build in.

The Circuit breaker pattern is old and it should have been implemented into the Function runtime if you ask me, at least as a optional configuration. Now it seems the team that writes the Triggers have to balance protecting unwitting developers from runaway cost on consumption plans vs delivering resilient and easy to use features.

0reactions
jesperkristensencommented, Jul 22, 2021

I don’t think it is possible to code custom logic that makes a function disable itself via the management API, because such logic will have to sit in the outermost try-catch to be robust. That try-catch is in the Functions runtime, so it would need to be implemented in the runtime itself.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshoot issues with the Azure Functions trigger for ...
This article discusses common issues, workarounds, and diagnostic steps when you're using the Azure Functions trigger for Azure Cosmos DB.
Read more >
CosmosDBTrigger reliably processing each document ...
Short answer is no. The Cosmos DB Trigger has an "at least once" delivery, which means that an item could be, in some...
Read more >
Cosmos Change Feed Input Trigger - Lease Stops ...
The function host just stops checking for new events or is hung. Restarting the function app has no effect. Lease Collection Partition State....
Read more >
How to trigger an Azure Function from Azure Cosmos DB
In this edition of Azure Tips and Tricks, learn how to trigger an Azure Function from Azure Cosmos DB For more tips and...
Read more >
Digging into Azure Functions: It's Time to Take Them ...
When a Function App is scaled, an additional instance was provisioned. How and when the runtime scales in Function Apps is heuristic by...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found