question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Clearing of historic Task or DagRuns leads to failed DagRun

See original GitHub issue

Apache Airflow version: 2.0.0

Environment:

  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release): Amazon Linux 2
  • Kernel (e.g. uname -a): Linux
  • Install tools:
  • Others:

What happened:

Clearing a DagRun that has an execution_date in the past where the time difference between now and that execution_date is greater than the DagRun timeout leads to the DagRun failing on clear, and not running.

What you expected to happen: The DagRun should enter the Running state

I anticipate this is a bug where the existing DagRun duration is not being reset and so the new DagRun is timing out as soon as it starts.

How to reproduce it:

Create an DAG with a DagRun timeout. Trigger said DAG. Wait for DagRun timeout + 1 minutes and then clear said DagRun. New DagRun will immediately fail.

Anything else we need to know: This occurs regardless of whether the DAG/Task succeeded or failed. Also, any DagRun that has timed out can never be cleared.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:7
  • Comments:20 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
renanlemecommented, Jun 22, 2021

We have this problem on our cluster. I found a workaround while we don’t fix it in a newer version. The way I found is cleaning the failed tasks from the homepage/dagrun list instead of the dag tree view it self. Example in pictures: Let’s say that after cleaning this execution it started failing, if you go to the homepage and search for the dag, you will see that a execution failed: image So if you click on it (yellow marked in the image) you will open the dag run list, and you will be able to see the failed one: image

Then, you can select it and clear the stated. It will clear the execution and it will work as expected 🙏🏻 image

Probably the workflow for this action (cleaning) is different from the one that we have on the tree view UI, hope this help you guys to fix it 😃

2reactions
kaxilcommented, Jun 8, 2021

Looks like a bug, needs fixing - added to 2.1.1 milestone

Read more comments on GitHub >

github_iconTop Results From Across the Web

airflow stops scheduling dagruns after task failure
This means that your Dag is trying to run, but it is waiting until the corresponding task from the previous DagRun has a...
Read more >
DAG Runs — Airflow Documentation - Apache Airflow
Any time the DAG is executed, a DAG Run is created and all tasks inside it ... failed if any of the leaf...
Read more >
Rerun Airflow DAGs | Astronomer Documentation
In this guide, you'll learn how to rerun tasks or DAGs and trigger historical DAG runs, and review the Airflow concepts of catchup...
Read more >
Clean up the Airflow database | Cloud Composer
This data includes information and logs related to past DAG runs, tasks, ... This DAG removes old entries from DagRun, TaskInstance, Log, XCom, ......
Read more >
How To Fix Task received SIGTERM signal In Airflow
Fixing the SIGTERM signal in Apache Airflow tasks ... how long a DagRun should be up before timing out / failing, so that...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found