question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

successful DAG run fails to be scheduled after being manually cleared if Dag.dagrun_timeout is set

See original GitHub issue

Apache Airflow version: 2.0.0

Kubernetes version (if you are using kubernetes) (use kubectl version):

Environment:

  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release): redhat 7.9
  • Kernel (e.g. uname -a): 3.10.0-1160.11.1.el7.x86_64
  • Install tools: pip
  • Others:

What happened:

I cleared a success DAG run and it failed to be scheduled again with the following error message:

{scheduler_job.py:1639} INFO - Run scheduled__2021-02-02T16:00:00+00:00 of some_job has timed-out

After removing dagrun_timeout, the same dag run can be rescheduled.

What you expected to happen:

I expect “timeout” is counted from the moment the DAG is reset.

Anything else we need to know:

I can reproduce it with the following code. Not sure if it is the intended behavior.

from airflow import DAG
from airflow.operators.dummy import DummyOperator
import pendulum
with DAG("test_timeout",
    default_args={'owner': 'airflow'},
    start_date= pendulum.yesterday(),
    schedule_interval='@daily',
    # add the following parameter after the first run is complete and then clear the success run
    # dagrun_timeout=pendulum.duration(minutes=1),
) as dag:
    DummyOperator(task_id='dummy')

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

3reactions
mattelliscommented, Mar 5, 2021

@YangMuye @kaxil Our users have the same issue. In case it’s of use, I figured out that the models.dag.clear method being invoked when a dag run is cleared from the graph or tree view, does not set the activate_dag_runs flag to True in the models.taskinstance.clear_task_instances method, which is where the start_date value for the run is reset to now(): https://github.com/apache/airflow/blob/09327ba6b371aa68cf681747c73a7a0f4968c173/airflow/models/dag.py#L1328

if activate_dag_runs and tis:
...
    dr.start_date = timezone.utcnow()

https://github.com/apache/airflow/blob/09327ba6b371aa68cf681747c73a7a0f4968c173/airflow/models/taskinstance.py#L221

Alternatively, the browse DAG runs view uses a different path to clear dag runs, and is still effective in resetting the dagrun start date: https://github.com/apache/airflow/blob/09327ba6b371aa68cf681747c73a7a0f4968c173/airflow/www/views.py#L3429

We’ve advised our users to use this method as a workaround for now, and set our normal 24 hour dag_run timeout to 7 days to cover most common use cases of clearing recently failed runs.

0reactions
kaxilcommented, Jul 27, 2021

Closing – fixed by https://github.com/apache/airflow/pull/16401 – which will be part of Airflow 2.1.3

Read more comments on GitHub >

github_iconTop Results From Across the Web

Does dagrun_timeout interfere when backfilling / clearing ...
When I go to the UI and clear all the tasks from the start_date to present, all DAGs are set to the running...
Read more >
airflow.models.dag — Airflow Documentation - Apache Airflow
Returns the last dag run for a dag, None if there was none. Last dag run can be any type of run eg....
Read more >
Troubleshooting Airflow scheduler issues | Cloud Composer
The scheduler marks tasks that are not finished (running, scheduled and queued) as failed if a DAG run doesn't finish within dagrun_timeout (a...
Read more >
Why my Airflow tasks got stuck in “no_status” and how I ...
One of our Airflow DAGs were not scheduling tasks. The issue looked very strange because it wasn't happening all the time. In the...
Read more >
airflow.models.dag
A dag (directed acyclic graph) is a collection of tasks with directional dependencies. A dag also has a schedule, a start date and...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found