question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Possible race condition causing events to be dropped

See original GitHub issue

Describe the bug I can’t reliably reproduce this but I feel there is a race condition when awaiting on an activity, and then awaiting on an external event.

var submitToRouteResult = await context.CallActivityAsync<SubmitToRouteResult>("submit_to_route", @object);
 
// other orchestration code here (no activities)

var result = await context.WaitForExternalEvent<CoreDispatchAttemptResult>($"ChannelSubmissionResult_{route.Name}");

I have a scenario where the ‘External Event’ is being dropped due to the prior Activity not completing.

When I observe the History I see the following events in this order:

PartitionKey RowKey Timestamp EventId EventType ExecutionId IsPlayed _Timestamp
67d8b855-bd13-4cb5-b937-c330eb88b767 0000000000000000 2019-01-15T14:46:32.856Z -1 OrchestratorStarted 0ccf5cd3f5ee4c84aaf93db34479e430 false 2019-01-15T14:46:32.640Z
67d8b855-bd13-4cb5-b937-c330eb88b767 0000000000000001 2019-01-15T14:46:32.857Z -1 ExecutionStarted 0ccf5cd3f5ee4c84aaf93db34479e430 true 2019-01-15T14:41:16.686Z
67d8b855-bd13-4cb5-b937-c330eb88b767 0000000000000002 2019-01-15T14:46:32.857Z 0 TaskScheduled 0ccf5cd3f5ee4c84aaf93db34479e430 false 2019-01-15T14:46:32.642Z
67d8b855-bd13-4cb5-b937-c330eb88b767 0000000000000003 2019-01-15T14:46:32.857Z -1 OrchestratorCompleted 0ccf5cd3f5ee4c84aaf93db34479e430 false 2019-01-15T14:46:32.642Z
67d8b855-bd13-4cb5-b937-c330eb88b767 0000000000000004 2019-01-15T14:46:39.576Z -1 OrchestratorStarted 0ccf5cd3f5ee4c84aaf93db34479e430 false 2019-01-15T14:46:39.551Z
67d8b855-bd13-4cb5-b937-c330eb88b767 0000000000000005 2019-01-15T14:46:39.576Z -1 EventRaised 0ccf5cd3f5ee4c84aaf93db34479e430 true 2019-01-15T14:46:33.170Z
67d8b855-bd13-4cb5-b937-c330eb88b767 0000000000000006 2019-01-15T14:46:39.576Z -1 TaskCompleted 0ccf5cd3f5ee4c84aaf93db34479e430 true 2019-01-15T14:46:33.240Z
67d8b855-bd13-4cb5-b937-c330eb88b767 0000000000000007 2019-01-15T14:46:39.576Z -1 OrchestratorCompleted 0ccf5cd3f5ee4c84aaf93db34479e430 false 2019-01-15T14:46:39.552Z
67d8b855-bd13-4cb5-b937-c330eb88b767 sentinel 2019-01-15T14:46:39.577Z 0ccf5cd3f5ee4c84aaf93db34479e430

Notice above how ‘EventRaised’ (Row Key 5) is being logged BEFORE ‘TaskCompleted’ (RowKey 6)

This means my orchestrator waits for the event forever, but the event never comes as its already been dropped when it was actually waiting for the TaskComplete of the prior Activity Call.

Other info

  1. Observing a different orchestration instance that completed successfully, the ‘EventRaised’ row is logged AFTER the ‘TaskCompleted’

  2. If I republish the external event (something that I can’t do in a production scenario), the orchestration completes as expected.

Investigative information

  • Durable Functions extension version: 1.7.0
  • Function App version (1.0 or 2.0): 2.0
  • Programming language used: c#

Instance Id: 67d8b855-bd13-4cb5-b937-c330eb88b767 Execution Id: 0ccf5cd3f5ee4c84aaf93db34479e430 Timestamp: 2019-01-15T14:46:39.576Z region: UK West

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
cgillumcommented, Jan 16, 2019

Thanks for the details, I understand the constraints you’re dealing with and I’m actually surprised that an issue/scenario like yours has not come up before, and it makes me realize that we need to deal with this more urgently.

In the meantime, is it possible to rearrange the API calls in your orchestration? For example, could you do something like this to mitigate the race?

// subscribe to the async callback but don't yet await on it
var asyncCallbackReceivedTask = context.WaitForExternalEvent<CoreDispatchAttemptResult>(
    $"ChannelSubmissionResult_{route.Name}");

// schedule the activity which will result in the callback event above
var submitToRouteResult = await context.CallActivityAsync<SubmitToRouteResult>(
    "submit_to_route",
    @object);
 
// other orchestration code here (no activities)

// now wait for the callback to be received
var result = await asyncCallbackReceivedTask;
0reactions
cgillumcommented, Mar 16, 2019

Resolved in v1.8.0 release.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Race condition
A race condition or race hazard is the condition of an electronics, software, ... Critical race conditions cause invalid execution and software bugs....
Read more >
The Anatomy of a Race Condition
A race condition was created because the event was received synchronously with the emit and the object in question had not yet actually...
Read more >
What is a Race Condition?
This kind of race condition happens when two processes read a value in a program and write back a new value. It often...
Read more >
What's a Race Condition?
A race condition occurs when electronics, software, or any other system is dependent on the sequence or timing of other uncontrollable events.
Read more >
Fixing a Race Condition
They believed the race condition was caused by an interview update being handled by more than one process at the same time.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found