Clearing of historic Task or DagRuns leads to failed DagRun
See original GitHub issueApache Airflow version: 2.0.0
Environment:
- Cloud provider or hardware configuration:
- OS (e.g. from /etc/os-release): Amazon Linux 2
- Kernel (e.g.
uname -a
): Linux - Install tools:
- Others:
What happened:
Clearing a DagRun that has an execution_date
in the past where the time difference between now and that execution_date
is greater than the DagRun timeout leads to the DagRun failing on clear, and not running.
What you expected to happen: The DagRun should enter the Running state
I anticipate this is a bug where the existing DagRun duration is not being reset and so the new DagRun is timing out as soon as it starts.
How to reproduce it:
Create an DAG with a DagRun timeout. Trigger said DAG. Wait for DagRun timeout + 1
minutes and then clear said DagRun. New DagRun will immediately fail.
Anything else we need to know: This occurs regardless of whether the DAG/Task succeeded or failed. Also, any DagRun that has timed out can never be cleared.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:7
- Comments:20 (7 by maintainers)
Top GitHub Comments
We have this problem on our cluster. I found a workaround while we don’t fix it in a newer version. The way I found is cleaning the failed tasks from the homepage/dagrun list instead of the dag tree view it self. Example in pictures: Let’s say that after cleaning this execution it started failing, if you go to the homepage and search for the dag, you will see that a execution failed: So if you click on it (yellow marked in the image) you will open the dag run list, and you will be able to see the failed one:
Then, you can select it and clear the stated. It will clear the execution and it will work as expected 🙏🏻
Probably the workflow for this action (cleaning) is different from the one that we have on the tree view UI, hope this help you guys to fix it 😃
Looks like a bug, needs fixing - added to 2.1.1 milestone