question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Scheduler overloaded when backfilling by clearing DAG history

See original GitHub issue

Apache Airflow version

2.1.2

Operating System

Debian 10

Versions of Apache Airflow Providers

apache-airflow-providers-apache-spark

Deployment

Other Docker-based deployment

Deployment details

Deployed on AWS EKS (Kubernetes version 1.21), backed by RDS database (Postgres API) Using Kubernetes Executor

What happened

I cleared ± 2000 runs of the same DAG in order to reprocess a dataset. This caused 10.000+ tasks to switch to the status “none”. The amount of tasks in this DAG which is allowed to run is fairly limited (task_concurrency=2 for the tasks, max_active_runs=3 for the DAG). These limits seem to be honoured, and no excessive amount of tasks are being scheduled by this DAG.

However, what happened is that other DAGs were prevented from running their tasks. Given that no task limits were being hit, I suspect that this has to do with the 10.000 tasks with status “none” keeping the scheduler over-occupied, leading to no useful work actually getting scheduled.

What you expected to happen

I expected the backfilling process not to block other DAGs from scheduling tasks. Potentially by having the scheduler ignoring tasks which violate the max_active_runs limit.

How to reproduce

  • Take a DAG with 2000 DagRuns and 5+ tasks per run
  • Set the state of all 2000 DagRuns to cleared
  • Observe starvation of other DAGs trying to schedule tasks concurrently with the backfilling

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:17 (9 by maintainers)

github_iconTop GitHub Comments

3reactions
ashbcommented, Oct 5, 2021

interestingly, sqlite3 3.25 does have window functions …

That is because SQlite is more of a database than MySQL.

1reaction
theistercommented, Nov 12, 2021

Well, I actually meant the latest release in the 2.2.* line, I didn’t check what the latest one was.

I’ll see if I can find a way to reproduce the behaviour on a local setup on our current images next week, that will also help triaging if it’s fixed on the latest stable 2.2.1 or RCs of 2.2.2.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Rerun Airflow DAGs | Astronomer Documentation
In this guide, you'll learn how to rerun tasks or DAGs and trigger historical DAG runs, and review the Airflow concepts of catchup...
Read more >
Airflow cleared backfill tasks do not get picked by the scheduler
When a task in a successful backfill run is cleared DagRun with running state is created but it is not picked by the...
Read more >
DAGs, Operators, Connections, and other issues in Apache ...
If the scheduler is not running, it might be due to a number of factors such as dependency installation failures, or an overloaded...
Read more >
DAG Runs — Airflow Documentation
The scheduler, by default, will kick off a DAG Run for any data interval that has not been run since the last data...
Read more >
Airflow Catchup & Backfill — Demystified | Nerd For Tech
Airflow allows missed DAG Runs to be scheduled again so that the pipelines catchup on the schedules that were missed for some reason....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found