Mistral: 2 bugs in one: workflow timeout/cancellation and wait-before
See original GitHub issueHi,
Discovered two bugs while I was trying to reproduce only one 😃
Running latest st2 v2.2.0.
-
Something is off with cancellation and timed out tasks when nested tasks are mistral workflows. You cancel the parent one, it doesn’t affect children. Also, sometimes cancelling them causes all sorts of unexpected results, like indefinite chatops notifications being triggered. In my example the parent task times out, but the child one keeps running, and running, and running, and running, over and over. If you remove
wait-before
parameter, then it’s gonna finish after all retries are exhausted. Still, doesn’t mean it’s a valid workaround. -
Adding wait-before to a task causes it re-init previously published variables (at least it feels like so).
Providing simple workflows and alias to reproduce it (sorry for the names I gave to them):
wf_cancelation_issue.meta.yaml:
---
name: wf_cancelation_issue
parameters:
skip_notify:
default:
- task
- error
- success
type: array
description: List of tasks to skip notifications for.
task:
type: string
description: The name of the task to run for reverse workflow.
workflow:
type: string
description: The name of the workflow to run if the entry_point is a workbook
of many workflows. The name should be in the format "<pack_name>.<action_name>.<workflow_name>".
If entry point is a workflow or a workbook with a single workflow, the runner
will identify the workflow automatically.
context:
default: {}
type: object
description: Additional workflow inputs.
tags: []
description: Reproducing a bug with mistral when task is cancelled or timedout
enabled: true
entry_point: workflows/wf_cancelation_issue.yaml
notify: {}
uid: action:c_int:wf_cancelation_issue
pack: c_int
ref: c_int.wf_cancelation_issue
runner_type: mistral-v2
workflows/wf_cancelation_issue.yaml:
---
version: '2.0'
c_int.wf_cancelation_issue:
tasks:
task:
action: core.noop
on-success:
- success
success:
action: c_int.wf_cancelation_issue_inner
timeout: 30
wf_cancelation_issue_inner.meta.yaml:
---
name: wf_cancelation_issue_inner
parameters:
skip_notify:
default:
- task1
- increase_attempt_number
- task3
- end
type: array
description: List of tasks to skip notifications for.
task:
type: string
description: The name of the task to run for reverse workflow.
workflow:
type: string
description: The name of the workflow to run if the entry_point is a workbook
of many workflows. The name should be in the format "<pack_name>.<action_name>.<workflow_name>".
If entry point is a workflow or a workbook with a single workflow, the runner
will identify the workflow automatically.
context:
default: {}
type: object
description: Additional workflow inputs.
retries:
type: integer
required: false
default: 5
tags: []
description: Reproducing a bug with mistral when task is cancelled or timedout
enabled: true
entry_point: workflows/wf_cancelation_issue_inner.yaml
notify: {}
uid: action:c_int:wf_cancelation_issue_inner
pack: c_int
ref: c_int.wf_cancelation_issue_inner
runner_type: mistral-v2
workflows/wf_cancelation_issue_inner.yaml:
---
version: '2.0'
c_int.wf_cancelation_issue_inner:
type: direct
input:
- retries
tasks:
task1:
action: core.noop
on-success:
- increase_attempt_number
increase_attempt_number:
action: core.noop
publish:
attempt: <% ($.get('attempt') or 0) + 1 %>
on-success:
- task3
task3:
wait-before: 10
action: core.local
input:
cmd: 'echo <% $.attempt %>; exit 1'
on-success:
- end
on-error:
- increase_attempt_number: <% $.attempt < $.retries %>
end:
action: core.noop
aliases/wf_cancelation_issue.yaml
---
name: alias_wf_cancelation_issue
enabled: true
action_ref: c_int.wf_cancelation_issue
description: Testing timeout and cancelation issue
formats:
- display: "wf_cancel_test"
representation:
- "wf_cancel_test"
ack:
enabled: true
format: 'WF Cancelation and Timeout workflow started...'
append_url: true
result:
extra:
slack:
color: "{% if execution.result is defined and execution.result.extra is defined and execution.result.extra.state is defined and execution.result.extra.state == 'SUCCESS' %}#219939{% else %}#d80015{% endif %}"
format: |
WF cancelation and timeout task is complete. {~}
```
{% if execution.result is defined and execution.result.extra is defined and execution.result.extra.state is defined and execution.result.extra.state == 'SUCCESS' %}
All good.
{% else %}
No good.
{% endif %}
```
Issue Analytics
- State:
- Created 7 years ago
- Comments:18 (18 by maintainers)
Top GitHub Comments
The source of the wait-before bug has been identified at https://bugs.launchpad.net/mistral/+bug/1681562. Please follow the link to review comments. We will need to wait for the rest of the Mistral core team to provide feedback on the use of the cache and how to workaround this issue.
Again, please separate issues in different post next time.