[BUG] Subworkflow recover mode is broken
See original GitHub issueDescribe the bug
When recovering this nested_parent_wf, the second t1 was incorrectly recovered with the subworkflow outputs, not the task outputs:

We should expect the outputs to be c
and t1_int_output
which corresponds to the t1 task signature and not the subworkflow signature: https://github.com/flyteorg/flytesnacks/blob/1b86bdd84cb55ad65c9c97256d91da0c1a3e2610/cookbook/core/control_flow/subworkflows.py#L67 as observed in the buggy recovered t1
Expected behavior
Subworkflow recovery should work correctly.
Additional context to reproduce
No response
Screenshots
No response
Are you sure this issue hasn’t been raised already?
- Yes
Have you read the Code of Conduct?
- Yes
Issue Analytics
- State:
- Created a year ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
3 Efficient Ways to Fix Android Recovery Mode Not Working
The most common reason to encounter recovery mode not working and getting no command error is that the Superuser access has been denied...
Read more >If Your Android Stuck in Recovery Mode, Try These Solutions
Solution 2: Force Reboot Your Android Device. The easiest and direct method to fix the stuck in Recovery Mode Android issue is to...
Read more >Sub Workflow Activity Overview
The sub workflow is another workflow that is defined in Cora SeQuence. Sub workflows are useful to promote encapsulation, such as when the...
Read more >Sub-workflow | Adobe Campaign - Experience League
The Sub-workflow activity lets you trigger the execution of another workflow and recover the result. This activity lets you use complex ...
Read more >Microsoft Dynamics AX Forum - Subworkflow issue
Hi All ,. I have created the ID -0098 Purchase requisition line workflow (treating as subworkflow) and created another one ID -0100 Purchase...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@katrogan this is a correctness issue. Can we get on this
OK, this is actually pretty complex - and can confirm dynamic tasks are broken too, but maybe worse. As explained above when we attempt to recover the node execution we use the NodeID from the NodeExecutionMetadata. This means that nodes with parents will not include the parent info, which is not the same as how the node ID is set during eventing. The result is that recovery of the subworkflow node “n1” searches for “n1” in the top-level workflow execution id rather than “n0-0-n1” like it should. This fix is relatively easy, but unfortunately doesn’t cover all cases.
The real problem happens in dynamic tasks, because the parent info may include a retry (which in subworkflows will always be 0). If the dynamic task parent task fails (incrementing the retry to 1) then as subnodes execute the fully qualified ID will be prepended with “n0-1” for the parent node id and retry attempt. So in this example rather than “n0-0-n1” it would be “n0-1-n1”. If we attempt to recover this dynamic task, we will search for “n0-0” because it doesn’t know if the parent was retried or not (this may require an additional lookup in flyteadmin? and gets way more complex).
TL;DR quick fix is relatively easy - will not result in incorrect behavior, but in corner cases will recompute previously completed tasks. The correct fix is probably a bit more complex. cc @katrogan thoughts?