Orchestration getting stuck while getting the lock
See original GitHub issueDescription
The full description with code snippets, screenshots, issue samples, etc is here https://github.com/Azure/azure-functions-durable-extension/discussions/2530
Expected behavior
Acquire the lock in seconds at most, not minutes or hours.
Actual behavior
It seems the orchestration is getting stuck while acquiring the locks of the entities intervening in the orchestration.
Known workarounds
Reset the durable storage account and the func app storage account
App Details
- Durable Functions extension version (e.g. v1.8.3): 2.10.0
- Azure Functions runtime version (1.0 or 2.0): 2
- Programming language used: C#
If deployed to Azure
- Timeframe issue observed:
- Function App name:
orders-saga
- Azure region: West-US
- Azure storage account name:
ordersagav2
Issue Analytics
- State:
- Created a month ago
- Comments:32 (14 by maintainers)
Top Results From Across the Web
Durable entity stays locked · Issue #1325
The context/entity locks, fails mid-way and never unlocks. Subsequent invocations halt because the lock is held. The runtime logs messages such ...
Read more >Clients stuck in Lock State:0 : r/SCCM
Then we were trying to Clear deployment locks and moving the clients out of the collection to see if this helps (nope). What...
Read more >Studio Troubleshooting - Operations Orchestration
When you are connected to SCM and "Enforce locking" is enabled, after you move a folder several times and then back to its...
Read more >Custom Naming workflow is hanging, stuck on obtaining Lock
If a previous lock didn't get cleaned up due to a failure, the next workflow run will hang, waiting to obtain a new...
Read more >Locked items appear to remain locked after commit
Though the commit should automatically release the locks, it can be resolved by: ... Micro Focus uses cookies to ensure you get the...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hi, sorry for the late response. We are still working on to identify the root cause.
So first for the Azure Storage Backend with the old partition manager, with the orchestration instance Id you provided, the issue seems like the partition of control queue 01 couldn’t be handed over to another worker for several hours. @davidmrdavid is working on creating a private package to mitigate this temporally in this case and we will give the private pkg to you tomorrow hopefully.
Since the issue hits on both versions of partition manager, we don’t know if the above root cause is the same in the new partition manager. So, could you provide us any orchestration instanced/ TaskHub/ TimeStamp, that hit on this issue with the partition manager V3? Even if the storage account being deleted is fine, since we will keep the Kusto logs in a separate storage account. That would be helpful for me to identity the cause in the partition manager V3 scenario.
It’s not intermittent, it was happening basically with every request processed by the orchestrator, here are two more examples where those instances were stuck for hours.
{"id":"3107849517381639","businessDate":"2023-08-07T00:00:00","locationToken":"5zAQ1KZzYkqmea9XbBkLbA=="}
This was suck for more than 10 hours.{"id":"3107849515304964","businessDate":"2023-08-07T00:00:00","locationToken":"5zAQ1KZzYkqmea9XbBkLbA=="}
and this one more than 6 hoursAs a result of that our processed orders rate was affected, you can see here how dramatically it dropped because the orchestrator was holding almost all of them.
@davidmrdavid please let me know if the information provided helps