question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Task stuck in "scheduled" when running in backfill job

See original GitHub issue

Apache Airflow version

2.2.4

What happened

We are running airflow 2.2.4 with KubernetesExecutor. I have created a dag to run airflow backfill command with SubprocessHook. What was observed is that when I started to backfill a few days’ dagruns the backfill would get stuck with some dag runs having tasks staying in the “scheduled” state and never getting running.

We are using the default pool and the pool is totoally free when the tasks got stuck.

I could find some logs saying: TaskInstance: <TaskInstance: test_dag_2.task_1 backfill__2022-03-29T00:00:00+00:00 [queued]> found in queued state but was not launched, rescheduling and nothing else in the log.

What you think should happen instead

The tasks stuck in “scheduled” should start running when there is free slot in the pool.

How to reproduce

Airflow 2.2.4 with python 3.8.13, KubernetesExecutor running in AWS EKS.

One backfill command example is: airflow dags backfill test_dag_2 -s 2022-03-01 -e 2022-03-10 --rerun-failed-tasks

The test_dag_2 dag is like:

import time
from datetime import timedelta

import pendulum
from airflow import DAG
from airflow.decorators import task
from airflow.models.dag import dag
from airflow.operators.bash import BashOperator
from airflow.operators.dummy import DummyOperator
from airflow.operators.python import PythonOperator

default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email': ['airflow@example.com'],
    'email_on_failure': True,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=5),
}


def get_execution_date(**kwargs):
    ds = kwargs['ds']
    print(ds)

with DAG(
        'test_dag_2',
        default_args=default_args,
        description='Testing dag',
        start_date=pendulum.datetime(2022, 4, 2, tz='UTC'),
        schedule_interval="@daily", catchup=True, max_active_runs=1,
) as dag:
    t1 = BashOperator(
        task_id='task_1',
        depends_on_past=False,
        bash_command='sleep 30'
    )

    t2 = PythonOperator(
        task_id='get_execution_date',
        python_callable=get_execution_date
    )

    t1 >> t2

Operating System

Debian GNU/Linux

Versions of Apache Airflow Providers

apache-airflow-providers-amazon==3.0.0 apache-airflow-providers-celery==2.1.0 apache-airflow-providers-cncf-kubernetes==3.0.2 apache-airflow-providers-docker==2.4.1 apache-airflow-providers-elasticsearch==2.2.0 apache-airflow-providers-ftp==2.0.1 apache-airflow-providers-google==6.4.0 apache-airflow-providers-grpc==2.0.1 apache-airflow-providers-hashicorp==2.1.1 apache-airflow-providers-http==2.0.3 apache-airflow-providers-imap==2.2.0 apache-airflow-providers-microsoft-azure==3.6.0 apache-airflow-providers-microsoft-mssql==2.1.0 apache-airflow-providers-odbc==2.0.1 apache-airflow-providers-postgres==3.0.0 apache-airflow-providers-redis==2.0.1 apache-airflow-providers-sendgrid==2.0.1 apache-airflow-providers-sftp==2.4.1 apache-airflow-providers-slack==4.2.0 apache-airflow-providers-snowflake==2.5.0 apache-airflow-providers-sqlite==2.1.0 apache-airflow-providers-ssh==2.4.0

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:3
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

4reactions
wookiistcommented, Sep 15, 2022

Is there any update regarding this issue? There are many tasks that fall into the ‘scheduled’ state when working on the backfill 😂

Apache Airflow 2.3.4 / Kubernetes Executor

0reactions
venkateshnyq550commented, Dec 15, 2022

Is there any update regarding this issue? My tasks that fall into the ‘scheduled’ state when working on the backfill

Apache Airflow 2.4.0 / Kubernetes Executor

Read more comments on GitHub >

github_iconTop Results From Across the Web

Airflow Backfill DAG runs stuck running with first task in ...
UPDATE: I realised my backfill runs fall outside the total dag interval. I.e before the dag start_date causing a blocking schedule dependancy.
Read more >
Backfilled DAGs marked as running but tasks doesn't start
When the backfill run for this DAG it succeeded for some of the DAGs, but the rest is now stuck in a running...
Read more >
FAQ — Airflow Documentation
Why is task not getting scheduled?¶. There are very many reasons why your task might not be getting scheduled. Here are some of...
Read more >
Use LatestOnlyOperator to skip some tasks while running a ...
Occasionally, when we use Airflow, we have a DAG which always works on the most recent snapshot of data even if we run...
Read more >
DAGs, Operators, Connections, and other issues in Apache ...
I see my tasks stuck or not completing ... see a '503' error when triggering a DAG in the CLI · Why does...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found