Task stuck in "scheduled" when running in backfill job
See original GitHub issueApache Airflow version
2.2.4
What happened
We are running airflow 2.2.4 with KubernetesExecutor. I have created a dag to run airflow backfill command with SubprocessHook. What was observed is that when I started to backfill a few days’ dagruns the backfill would get stuck with some dag runs having tasks staying in the “scheduled” state and never getting running.
We are using the default pool and the pool is totoally free when the tasks got stuck.
I could find some logs saying:
TaskInstance: <TaskInstance: test_dag_2.task_1 backfill__2022-03-29T00:00:00+00:00 [queued]> found in queued state but was not launched, rescheduling
and nothing else in the log.
What you think should happen instead
The tasks stuck in “scheduled” should start running when there is free slot in the pool.
How to reproduce
Airflow 2.2.4 with python 3.8.13, KubernetesExecutor running in AWS EKS.
One backfill command example is: airflow dags backfill test_dag_2 -s 2022-03-01 -e 2022-03-10 --rerun-failed-tasks
The test_dag_2 dag is like:
import time
from datetime import timedelta
import pendulum
from airflow import DAG
from airflow.decorators import task
from airflow.models.dag import dag
from airflow.operators.bash import BashOperator
from airflow.operators.dummy import DummyOperator
from airflow.operators.python import PythonOperator
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'email': ['airflow@example.com'],
'email_on_failure': True,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
}
def get_execution_date(**kwargs):
ds = kwargs['ds']
print(ds)
with DAG(
'test_dag_2',
default_args=default_args,
description='Testing dag',
start_date=pendulum.datetime(2022, 4, 2, tz='UTC'),
schedule_interval="@daily", catchup=True, max_active_runs=1,
) as dag:
t1 = BashOperator(
task_id='task_1',
depends_on_past=False,
bash_command='sleep 30'
)
t2 = PythonOperator(
task_id='get_execution_date',
python_callable=get_execution_date
)
t1 >> t2
Operating System
Debian GNU/Linux
Versions of Apache Airflow Providers
apache-airflow-providers-amazon==3.0.0 apache-airflow-providers-celery==2.1.0 apache-airflow-providers-cncf-kubernetes==3.0.2 apache-airflow-providers-docker==2.4.1 apache-airflow-providers-elasticsearch==2.2.0 apache-airflow-providers-ftp==2.0.1 apache-airflow-providers-google==6.4.0 apache-airflow-providers-grpc==2.0.1 apache-airflow-providers-hashicorp==2.1.1 apache-airflow-providers-http==2.0.3 apache-airflow-providers-imap==2.2.0 apache-airflow-providers-microsoft-azure==3.6.0 apache-airflow-providers-microsoft-mssql==2.1.0 apache-airflow-providers-odbc==2.0.1 apache-airflow-providers-postgres==3.0.0 apache-airflow-providers-redis==2.0.1 apache-airflow-providers-sendgrid==2.0.1 apache-airflow-providers-sftp==2.4.1 apache-airflow-providers-slack==4.2.0 apache-airflow-providers-snowflake==2.5.0 apache-airflow-providers-sqlite==2.1.0 apache-airflow-providers-ssh==2.4.0
Deployment
Official Apache Airflow Helm Chart
Deployment details
No response
Anything else
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project’s Code of Conduct
Issue Analytics
- State:
- Created a year ago
- Reactions:3
- Comments:5 (1 by maintainers)
Is there any update regarding this issue? There are many tasks that fall into the ‘scheduled’ state when working on the backfill 😂
Apache Airflow 2.3.4 / Kubernetes Executor
Is there any update regarding this issue? My tasks that fall into the ‘scheduled’ state when working on the backfill
Apache Airflow 2.4.0 / Kubernetes Executor