Scheduler overloaded when backfilling by clearing DAG history
See original GitHub issueApache Airflow version
2.1.2
Operating System
Debian 10
Versions of Apache Airflow Providers
apache-airflow-providers-apache-spark
Deployment
Other Docker-based deployment
Deployment details
Deployed on AWS EKS (Kubernetes version 1.21), backed by RDS database (Postgres API) Using Kubernetes Executor
What happened
I cleared ± 2000 runs of the same DAG in order to reprocess a dataset. This caused 10.000+ tasks to switch to the status “none”. The amount of tasks in this DAG which is allowed to run is fairly limited (task_concurrency=2 for the tasks, max_active_runs=3 for the DAG). These limits seem to be honoured, and no excessive amount of tasks are being scheduled by this DAG.
However, what happened is that other DAGs were prevented from running their tasks. Given that no task limits were being hit, I suspect that this has to do with the 10.000 tasks with status “none” keeping the scheduler over-occupied, leading to no useful work actually getting scheduled.
What you expected to happen
I expected the backfilling process not to block other DAGs from scheduling tasks. Potentially by having the scheduler ignoring tasks which violate the max_active_runs
limit.
How to reproduce
- Take a DAG with 2000 DagRuns and 5+ tasks per run
- Set the state of all 2000 DagRuns to cleared
- Observe starvation of other DAGs trying to schedule tasks concurrently with the backfilling
Anything else
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project’s Code of Conduct
Issue Analytics
- State:
- Created 2 years ago
- Comments:17 (9 by maintainers)
That is because SQlite is more of a database than MySQL.
Well, I actually meant the latest release in the
2.2.*
line, I didn’t check what the latest one was.I’ll see if I can find a way to reproduce the behaviour on a local setup on our current images next week, that will also help triaging if it’s fixed on the latest stable 2.2.1 or RCs of 2.2.2.