question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Random "Non-Deterministic workflow detected: A previous execution of this orchestration scheduled an activity task with sequence ID" error

See original GitHub issue

Hi,

I hope you could point me in the right direction with this one…

I am getting intermittent/random errors such as this when execution orchestration. Error details are shown below:

{ "name": "somename", "instanceId": "2c752ceb53fa4165a16ac4255d62b922", "runtimeStatus": "Failed", "input": null, "customStatus": null, "output": "**Non-Deterministic workflow detected: A previous execution of this orchestration scheduled an activity task with sequence ID 236 and name 'somename' (version ''), but the current replay execution hasn't (yet?) scheduled this task. Was a change made to the orchestrator code after this instance had already started running?**", "createdTime": "2023-07-14T10:57:31Z", "lastUpdatedTime": "2023-07-14T11:38:13Z" }

Orchestration code is as follows:

` [FunctionName(“MyOrchestration”)] public async Task MyOrchestrationRun ( [OrchestrationTrigger] IDurableOrchestrationContext context, ILogger logger, CancellationToken cancellationToken ) { var input = context.GetInput<RuntimeInput>();

        var replayLogger = context.CreateReplaySafeLogger(logger);

        if (!cancellationToken.IsCancellationRequested)
        {
            try
            {
                replayLogger.LogDebug($"[{context.InstanceId}] - Started orchestrator.");
                foreach (var id in input.Leads)
                {
                    try
                    {
                        await context.CallActivityWithRetryAsync("activityname", new RetryOptions(TimeSpan.FromSeconds(3), 3) { BackoffCoefficient = 1 }, new Item()
                        {
                            //property population here
                        });
                    }
                    catch (Exception ex)
                    {
                        replayLogger.LogError(ex, $"[{context.InstanceId}] - Processing for lead id {id} failed (all retries).");
                    }
                }

                replayLogger.LogDebug($"[{context.InstanceId}] - Processed [{input.Leads.Count}] leads.");

                await context.CallActivityAsync("activityname", new CompleteOrchestrationRequest()
                {
                     //property population here
                });
            }
            catch (Exception ex)
            {
                replayLogger.LogError($"[{context.InstanceId}] - Failed orchestrator instance id due to: {ex.Message}", ex);

                await context.CallActivityAsync("activityname", new CompleteOrchestrationRequest()
                {
                    //property population here
                });

                throw;
            }
            finally
            {
                replayLogger.LogDebug($"[{context.InstanceId}] - Completing orchestrator...");

                await context.CallActivityAsync("activityname", context.InstanceId);

                replayLogger.LogDebug($"[{context.InstanceId}] - Completed orchestrator.");
            }
        }
        else
        {
            replayLogger.LogError($"[{context.InstanceId}] - Failed orchestrator instance id due to cancellation request");

            await context.CallActivityAsync("activityname", new CompleteOrchestrationRequest()
            {
               //property population here
            });

            throw new OperationCanceledException("Cancellation requested.");
        }
    }
}

`

"extensions": { "durableTask": { "extendedSessionsEnabled": true, "extendedSessionIdleTimeoutInSeconds": 600, "hubName": "somenameTaskHub", "storageProvider": { "partitionCount": 8 } } }

I am testing this orchestration with provided suspend/resume API calls. When let to run without interruptions, orchestration usually completes without errors. However, when try to suspend and resume orchestration multiple times, simulating user behavior, it fails randomly with above error. I tried to figure out what could be root cause for non-deterministic execution behavior without success. I was not able to consistently reproduce this issue as it appears randomly. Number of leads/iteration range from few hundreds to 20-30K… In order to be able to utilize existing suspend and resume API I am not using fan-out.

I am using NET 6.0, V4.

Thanks, Anel

Issue Analytics

  • State:open
  • Created 2 months ago
  • Comments:6 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
bachuvcommented, Aug 7, 2023

Hi @anel-al, thank you for providing the additional information! Yes, providing the instance ids + region + timestamps is enough and you don’t need to provide the app name. I looked at Instance Id 035f5f742964485590e1bf92fd9459d3 at 2023-07-14T18:19 and see that the TaskScheduled message is processed at the same time as the ExecutionSuspended message and before the ExecutionSuspended event is saved to history. This looks like a bug we will need to fix.

0reactions
anel-alcommented, Aug 10, 2023

@bachuv thank you. Let me know if I can provide additional details about issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Orchestrator fails randomly with "Non-Deterministic ...
In most cases, a non-deterministic orchestration is caused by incorrectly written orchestrator function code. Can you share a sample of what ...
Read more >
Non-Deterministic workflow detected in Durable Functions
The latest hiccup I have that I can't figure out how to get around is: Non-Deterministic workflow detected: TaskScheduledEvent.
Read more >
Durable activity sometimes detected as Non-Deterministic ...
The orchestrator is doing a foreach loop. First, it calls an activity to do an update on a SQL Server database. Then a...
Read more >
Why can't Workflows contain non deterministic code? And ...
If the workflow code is non deterministic then it can end up in a different state than the original execution. For example, the...
Read more >
Understanding Azure Durable Functions - Part 3
Essentially what this means is the the code in the orchestrator function may execute multiple times per invocation (be replayed) and the ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found