question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Tasks in DAGs with `depends_on_past` or `task_concurrency` are not being scheduled

See original GitHub issue

Apache Airflow version: 1.10.13

What happened:

After performing an upgrade to v1.10.13 we noticed that tasks in some of our DAGs were not be scheduled. After a bit of investigation we discovered that by commenting out 'depends_on_past': True the issue went away.

What you expected to happen:

We think the issue might have something to do with this which was introduced to 1.10.13

[AIRFLOW-3607] Only query DB once per DAG run for TriggerRuleDep (#4751)

How to reproduce it:

  1. Install Airflow v1.10.13 from pip
  2. Start webserver and scheduler
  3. Add the following code as a DAG
  4. Switch the DAG on in the UI.
from airflow import models
from airflow.operators.dummy_operator import DummyOperator
from datetime import datetime, timedelta

default_args = {
    'owner': 'airflow',
    'start_date': datetime(2018, 10, 31),
    'depends_on_past': True,
    'retries': 3,
    'retry_delay': timedelta(minutes=5)
}

dag_name = 'my-test-dag'

with models.DAG(dag_name,
                default_args=default_args,
                schedule_interval='0 0 * * *',
                catchup=False,
                max_active_runs=5,
                ) as dag:

    test = DummyOperator(
        task_id='test'
    ) 

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:10 (8 by maintainers)

github_iconTop GitHub Comments

3reactions
kaxilcommented, Nov 27, 2020
3reactions
kaxilcommented, Nov 27, 2020

I can confirm the bug. I was able to reproduce it with task with task_concurrency or depends_on_past with LocalExecutor and the following DAG:

from airflow import models
from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta

default_args = {
    'owner': 'airflow',
    'start_date': datetime(2018, 10, 31),
    'retries': 3,
    'retry_delay': timedelta(minutes=5)
}

dag_name = 'dag-bugcheck'

with models.DAG(dag_name,
                default_args=default_args,
                schedule_interval='0 0 * * *',
                catchup=False,
                max_active_runs=5,
                ) as dag:

    test1 = DummyOperator(
        task_id='test1',
        task_concurrency=10,
    )

    test2 = BashOperator(
        task_id='test2',
        bash_command='echo hi',
        depends_on_past=True,
    )

    test3 = BashOperator(
        task_id='test3',
        bash_command='echo hi',
    )

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why my Airflow tasks got stuck in “no_status” and how I fixed it
One of our Airflow DAGs were not scheduling tasks. The issue looked very strange because it wasn't happening all the time.
Read more >
Release Notes — Airflow Documentation
New to this release of Airflow is the concept of Datasets to Airflow, and with it a new way of scheduling dags: data-aware...
Read more >
DAG getting stuck in "running" state indefinitely #15978 - GitHub
I expect all my tasks to be run and my dag to be marked as "success" or "failed" if there is an issue....
Read more >
Airflow parallelism - Stack Overflow
parallelism is the max number of task instances that can run concurrently on airflow. This means that across all running DAGs, no more...
Read more >
Airflow Task Parallelism. How to control concurrency
We can increase the concurrency of the task by increasing the number of schedulers. This will increase the task concurrency set at the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found