Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Scheduler Memory Leak in Airflow 2.0.1

See original GitHub issue

Apache Airflow version: 2.0.1

Kubernetes version (if you are using kubernetes) (use kubectl version): v1.17.4

Environment: Dev

OS (e.g. from /etc/os-release): RHEL7

What happened:

After running fine for some time my airflow tasks got stuck in scheduled state with below error in Task Instance Details: “All dependencies are met but the task instance is not running. In most cases this just means that the task will probably be scheduled soon unless: - The scheduler is down or under heavy load If this task instance does not start soon please contact your Airflow administrator for assistance.”

What you expected to happen:

I restarted the scheduler then it started working fine. When i checked my metrics i realized the scheduler has a memory leak and over past 4 days it has reached up to 6GB of memory utilization

In version >2.0 we don’t even have the run_duration config option to restart scheduler periodically to avoid this issue until it is resolved.

How to reproduce it: I saw this issue in multiple dev instances of mine all running Airflow 2.0.1 on kubernetes with KubernetesExecutor. Below are the configs that i changed from the default config. max_active_dag_runs_per_dag=32 parallelism=64 dag_concurrency=32 sql_Alchemy_pool_size=50 sql_Alchemy_max_overflow=30

Anything else we need to know:

The scheduler memory leaks occurs consistently in all instances i have been running. The memory utilization keeps growing for scheduler.