Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Orchestration getting stuck while getting the lock

See original GitHub issue

Description

The full description with code snippets, screenshots, issue samples, etc is here https://github.com/Azure/azure-functions-durable-extension/discussions/2530

Expected behavior

Acquire the lock in seconds at most, not minutes or hours.

Actual behavior

It seems the orchestration is getting stuck while acquiring the locks of the entities intervening in the orchestration.

Known workarounds

Reset the durable storage account and the func app storage account

App Details

Durable Functions extension version (e.g. v1.8.3): 2.10.0
Azure Functions runtime version (1.0 or 2.0): 2
Programming language used: C#

If deployed to Azure

Timeframe issue observed:
Function App name: orders-saga
Azure region: West-US
Azure storage account name: ordersagav2

Issue Analytics

State:
Created a month ago
Comments:32 (14 by maintainers)

Top GitHub Comments

1reaction

nytiancommented, Aug 21, 2023

Hi, sorry for the late response. We are still working on to identify the root cause.

So first for the Azure Storage Backend with the old partition manager, with the orchestration instance Id you provided, the issue seems like the partition of control queue 01 couldn’t be handed over to another worker for several hours. @davidmrdavid is working on creating a private package to mitigate this temporally in this case and we will give the private pkg to you tomorrow hopefully.

Since the issue hits on both versions of partition manager, we don’t know if the above root cause is the same in the new partition manager. So, could you provide us any orchestration instanced/ TaskHub/ TimeStamp, that hit on this issue with the partition manager V3? Even if the storage account being deleted is fine, since we will keep the Kusto logs in a separate storage account. That would be helpful for me to identity the cause in the partition manager V3 scenario.

1reaction

vany0114commented, Aug 8, 2023

It’s not intermittent, it was happening basically with every request processed by the orchestrator, here are two more examples where those instances were stuck for hours.

{"id":"3107849517381639","businessDate":"2023-08-07T00:00:00","locationToken":"5zAQ1KZzYkqmea9XbBkLbA=="} This was suck for more than 10 hours.
{"id":"3107849515304964","businessDate":"2023-08-07T00:00:00","locationToken":"5zAQ1KZzYkqmea9XbBkLbA=="} and this one more than 6 hours

As a result of that our processed orders rate was affected, you can see here how dramatically it dropped because the orchestrator was holding almost all of them.