Eternal orchestration eventually stops calling activity functions
See original GitHub issueDescription
I have a durable function using the singleton pattern to ensure only one instance of it ever runs. I’ve created a simplified version below that calls an activity function to simulate some work and then waits for an external event. Once the external event arrives it uses ContinueAsNew
to start over again.
The reason I’m using the external event is because the trigger that starts the orchestrator can fire at any moment and so if the orchestrator is already running I’ll want it to start all over again as soon as it has completed its current run.
This seems to run perfectly, however, at some point in time the orchestrator just stops calling any activities. I know this is incredibly vague, but I can’t find any pattern or reason as to why this happens. Checking the logs I see a log to say that the trigger function is sending the wake event to the orchestrator, then I see “Executing orchestrator” and then “Executed orchestrator”. I see no logs to say it has run the activitiy.
The thing is, it must be running it because it still responds for the external event every time the trigger function sends it, it just doesn’t run the activity.
namespace GJTest.Function
{
public static class DurableFunctionsOrchestrationCSharp1
{
const string _instanceId = "some-unique-instance-id";
const string _eventIdentifier = "NewEvents";
[FunctionName(nameof(RunOrchestrator))]
public static async Task RunOrchestrator(
[OrchestrationTrigger] IDurableOrchestrationContext context, ILogger log)
{
await context.CallActivityAsync(nameof(SimulateWork), null);
await context.WaitForExternalEvent(_eventIdentifier);
context.ContinueAsNew(null);
}
[FunctionName(nameof(SimulateWork))]
public static async Task SimulateWork([ActivityTrigger] IDurableActivityContext context, ILogger log)
{
log.LogInformation("Simulating some work");
await Task.Delay(TimeSpan.FromSeconds(5));
log.LogInformation("Finished simulating some work");
}
[FunctionName("HttpTriggerOrchestrator")]
public static async Task<HttpResponseMessage> HttpStart(
[HttpTrigger(AuthorizationLevel.Anonymous, "get", "post")] HttpRequestMessage req,
[DurableClient] IDurableOrchestrationClient starter,
ILogger log)
{
if (await OrchestrationHasFinished(starter, log))
{
log.LogInformation("Starting a new instance of orchestration");
await starter.StartNewAsync(nameof(RunOrchestrator), _instanceId);
}
else
{
log.LogInformation("Orchestrator already running, sending wake event");
await starter.RaiseEventAsync(_instanceId, _eventIdentifier);
}
return starter.CreateCheckStatusResponse(req, _instanceId);
}
private static async Task<bool> OrchestrationHasFinished(IDurableOrchestrationClient starter, ILogger log)
{
var existingInstance = await starter.GetStatusAsync(_instanceId);
log.LogInformation($"Orchestration status is {existingInstance?.RuntimeStatus.ToString()}");
return existingInstance == null
|| existingInstance.RuntimeStatus == OrchestrationRuntimeStatus.Completed
|| existingInstance.RuntimeStatus == OrchestrationRuntimeStatus.Failed
|| existingInstance.RuntimeStatus == OrchestrationRuntimeStatus.Terminated;
}
}
}
Expected behavior Orchestrator runs forever and starts over after receiving the external event.
Actual behavior At some unspecified point the function eventually stops calling the activity functions. Or at least that is what appears to be happening.
To get it working again, I have to terminate the orchestration and use a new instance ID. I imagine I’d achieve the same results by deleting the instance from the instances table.
App Details Microsoft.Azure.WebJobs.Extensions.DurableTask: v2.9.2 Azure Functions runtime version: 4 Programming language used: dotnet
I should add that I’ve recently updated to the latest (v2.9.5) of the DurableTask package so I’m currently monitoring it with that version.
Issue Analytics
- State:
- Created 4 months ago
- Comments:8 (4 by maintainers)
Thanks for your response, @nytiannn! Yes, you’re correct in that I don’t see any TaskScheduled/TaskCompleted events in the logs.
I’ll continue to monitor the function and will check the work item queue if I experience the same behaviour again.
Thank you very much, @nytiannn 😃 I really appreciate your help. I think my problem definitely lies in an activity function causing issues and not the durable functions library. I’m happy for this issue to be closed.
Thanks again.