Random "Non-Deterministic workflow detected: A previous execution of this orchestration scheduled an activity task with sequence ID" error
See original GitHub issueHi,
I hope you could point me in the right direction with this one…
I am getting intermittent/random errors such as this when execution orchestration. Error details are shown below:
{ "name": "somename", "instanceId": "2c752ceb53fa4165a16ac4255d62b922", "runtimeStatus": "Failed", "input": null, "customStatus": null, "output": "**Non-Deterministic workflow detected: A previous execution of this orchestration scheduled an activity task with sequence ID 236 and name 'somename' (version ''), but the current replay execution hasn't (yet?) scheduled this task. Was a change made to the orchestrator code after this instance had already started running?**", "createdTime": "2023-07-14T10:57:31Z", "lastUpdatedTime": "2023-07-14T11:38:13Z" }
Orchestration code is as follows:
` [FunctionName(“MyOrchestration”)] public async Task MyOrchestrationRun ( [OrchestrationTrigger] IDurableOrchestrationContext context, ILogger logger, CancellationToken cancellationToken ) { var input = context.GetInput<RuntimeInput>();
var replayLogger = context.CreateReplaySafeLogger(logger);
if (!cancellationToken.IsCancellationRequested)
{
try
{
replayLogger.LogDebug($"[{context.InstanceId}] - Started orchestrator.");
foreach (var id in input.Leads)
{
try
{
await context.CallActivityWithRetryAsync("activityname", new RetryOptions(TimeSpan.FromSeconds(3), 3) { BackoffCoefficient = 1 }, new Item()
{
//property population here
});
}
catch (Exception ex)
{
replayLogger.LogError(ex, $"[{context.InstanceId}] - Processing for lead id {id} failed (all retries).");
}
}
replayLogger.LogDebug($"[{context.InstanceId}] - Processed [{input.Leads.Count}] leads.");
await context.CallActivityAsync("activityname", new CompleteOrchestrationRequest()
{
//property population here
});
}
catch (Exception ex)
{
replayLogger.LogError($"[{context.InstanceId}] - Failed orchestrator instance id due to: {ex.Message}", ex);
await context.CallActivityAsync("activityname", new CompleteOrchestrationRequest()
{
//property population here
});
throw;
}
finally
{
replayLogger.LogDebug($"[{context.InstanceId}] - Completing orchestrator...");
await context.CallActivityAsync("activityname", context.InstanceId);
replayLogger.LogDebug($"[{context.InstanceId}] - Completed orchestrator.");
}
}
else
{
replayLogger.LogError($"[{context.InstanceId}] - Failed orchestrator instance id due to cancellation request");
await context.CallActivityAsync("activityname", new CompleteOrchestrationRequest()
{
//property population here
});
throw new OperationCanceledException("Cancellation requested.");
}
}
}
`
"extensions": { "durableTask": { "extendedSessionsEnabled": true, "extendedSessionIdleTimeoutInSeconds": 600, "hubName": "somenameTaskHub", "storageProvider": { "partitionCount": 8 } } }
I am testing this orchestration with provided suspend/resume API calls. When let to run without interruptions, orchestration usually completes without errors. However, when try to suspend and resume orchestration multiple times, simulating user behavior, it fails randomly with above error. I tried to figure out what could be root cause for non-deterministic execution behavior without success. I was not able to consistently reproduce this issue as it appears randomly. Number of leads/iteration range from few hundreds to 20-30K… In order to be able to utilize existing suspend and resume API I am not using fan-out.
I am using NET 6.0, V4.
Thanks, Anel
Issue Analytics
- State:
- Created 2 months ago
- Comments:6 (1 by maintainers)
Hi @anel-al, thank you for providing the additional information! Yes, providing the instance ids + region + timestamps is enough and you don’t need to provide the app name. I looked at Instance Id 035f5f742964485590e1bf92fd9459d3 at 2023-07-14T18:19 and see that the TaskScheduled message is processed at the same time as the ExecutionSuspended message and before the ExecutionSuspended event is saved to history. This looks like a bug we will need to fix.
@bachuv thank you. Let me know if I can provide additional details about issue.