question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Timeouts finish unexpectedly when events are fired inside an orchestration

See original GitHub issue

As part of a demo, I have a sub-orchestration that simulates a 2 step communication with a 3rd party. (sends a signal and then waits for a callback until continuing with the execution. The issue is that every time an event is raised, the timeOut suddenly completes during the initial replay (so before the onEvent was processed) and the suborchestration fails due to this. I’m using the SQLProvider for DTF.

Sample code : Sub orchestration :


public class CompatibilityOrchestrator : TaskOrchestration<CompatibilityResponse, CompatibilityGenerationRequest, CompatibilityResponse, string>
    {
        TaskCompletionSource<CompatibilityResponse> receivedCompatResponseEvent = new TaskCompletionSource<CompatibilityResponse>();
        private readonly ILogger<CompatibilityOrchestrator> _logger;

        public CompatibilityOrchestrator(ILogger<CompatibilityOrchestrator> logger)
        {
            _logger = logger;
        }
       public override async Task<CompatibilityResponse> RunTask(OrchestrationContext context, CompatibilityGenerationRequest input)
        {
            var sv = input;
            sv.OrchestrationInstanceID = context.OrchestrationInstance.InstanceId;
            _logger.LogInformation("Sending generate compatibility report signal to 3rd Party :" + sv.OrchestrationInstanceID.ToString());

            var timeoutTask = context.CreateTimer(context.CurrentUtcDateTime.AddMinutes(5), "TimedOut");

            var winner = await Task.WhenAny(receivedCompatResponseEvent.Task, timeoutTask);

            if (winner == receivedCompatResponseEvent.Task && receivedCompatResponseEvent.Task.Result != null)
            {
                return receivedCompatResponseEvent.Task.Result;
            }
            else
            {
                throw new TimeoutException();
            }
        }

        public override void OnEvent(OrchestrationContext context, string name, CompatibilityResponse compatResponse)
        { 
           if (name.Equals("ReceiveCompatResponseEvent")) { 
                receivedCompatResponseEvent.SetResult(compatResponse);
            }
        }
    }

The event is raised from the following endpoint :

        [HttpPost]
        [Route("api/{partitionId}/compatResponse")]
        public async Task CompatibilityReportResponse([FromRoute] string partitionId, [FromBody] CompatibilityResponse data)
        {
            await _workflowClient.Client.RaiseEventAsync(new OrchestrationInstance() { InstanceId = data.OrchestrationInstanceID}, "ReceiveCompatResponseEvent",data);
        }

The issue is that no matter how long I set the expiration of the timer (1 day, 1 month), whenever I fire an event that triggers this subOrchestration, the timer completes and the subOrchestration fails with TimeoutException(); I was able to work around this by creating a custom activity that simulates “waiting”, and then waiting for that task instead of the original timer :

public class FakeTimerActivity : TaskActivity<string, Task<string>>
    {
        private readonly ILogger<FakeTimerActivity> _logger;

        public FakeTimerActivity(ILogger<FakeTimerActivity> logger)
        {
            _logger = logger;
        }

        protected override Task<string> Execute(TaskContext context, string input)
        {
            return Task.Delay(50000).ContinueWith(t => "Hello");
        }
    }

And in the orchestration I would use this : var timeoutTask = context.ScheduleTask<string>(typeof(FakeTimerActivity));

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:6

github_iconTop GitHub Comments

1reaction
moldovangeorgecommented, Nov 12, 2021

After taking a look at the history of the orchestration in the DB I can see that a TimerFire event is processed before the event is raised : image The timestamp of the TimerFire is the same as the one of TimerCreated but that is not true because the TimerFire happened when the event was raised ( I investigated the db before, during and after the orchestration finished). The flow seems to be this one :

  • Orchestration is created and a Timer Event is registered in the NewEvents table -> The VisibleTime is set correctly ( a dateTime in the future)
  • Orchestration halts and waits for either the Timeout or the EventTask to finish
  • Event is raised and this makes the Timer Event to be consumed from the NewEvents table, even though the VisibleTime is still in the future -> a TimerFire event is logged in the History table and the Timeout task is completed
  • The Orchestration replays, finds out that the Timeout task is completed and exits.
  • The event that was raised is processed but the orchestration is already finished.
0reactions
cgillumcommented, Nov 12, 2021

Thanks for these details. It sounds like I may have misunderstood the original issue. What you’re describing sounds a lot like https://github.com/microsoft/durabletask-mssql/issues/50.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Manage instances in Durable Functions - Azure
Orchestrations in Durable Functions are long-running stateful functions that can be started, queried, suspended, resumed, and terminated ...
Read more >
Promises and Events: Some Pitfalls and Workarounds
Our blockUntilEvent primitive allows us to resolve a promise whenever an event is fired (at most once). This alone provides several qualitfy-of- ...
Read more >
Interaction Interface Events
So an event with resultof = revoking is a kind of unexpected event in a session. Most probably the session should be terminated...
Read more >
JS: How to cancel a timeOut even from Firing if some other ...
I have a Promise chain in my UI which sends data to my server. On success/Fail the server updates the datamodel which generates...
Read more >
Orchestration Server Developer's Guide
Unexpected Events : The most reliable way for an SCXML session to behave in the case of a primary Interaction Server failure is...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found