question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Worker never running tasks or failing them with no explanation for many simultaneous tasks

See original GitHub issue

Apache Airflow version: 2.0.0rc1

Kubernetes version (if you are using kubernetes) (use kubectl version): 1.19.4

Environment:

  • Cloud provider or hardware configuration: Laptop with 6 cores and 32GB RAM
  • OS (e.g. from /etc/os-release): Ubuntu 20.04.1 LTS
  • Kernel (e.g. uname -a): 5.4.0-56-generic
  • Install tools:
  • Others:

What happened: I am running the 2.0.0 release candidate in minikube using the celery executor. It was installed using the helm chart in git, with the executor changed and a persistent volume claim for storing dags added. ‘workers.replicas’ is set to 2. I’m testing different scaling options by launching large amounts of tasks and evaluating how quickly/consistently they run. The DAG is run manually through the web server and on most runs, either some of the tasks will fail with no explanation or some tasks will be left in the ‘queued’ state and never run. The tasks in the ‘queued’ state are shown as ‘active’ in the flower dashboard but do not appear to be actually running.

As part of my testing I have increased the values of AIRFLOW__CORE__DAG_CONCURRENCY and AIRFLOW__CELERY__WORKER_CONCURRENCY. This seems like it might exacerbate the problem but I have reproduced it with the default settings.

What you expected to happen: All run successfully

What do you think went wrong? Initially I thought I was over-taxing the system, but resource monitoring has shown nothing indicating this. My system has 11Gb of RAM free and 4 CPUs, and CPU utilization never went over 30%.

How to reproduce it: Attached is a simple DAG that produces the issue on my setup. concurrent_workflow.zip

Anything else we need to know: I haven’t seen anything indicating an error in the logs, but would be happy to provide if requested.

How often does this problem occur? Once? Every time etc? The majority of my runs (75-90%) have resulted in at between 1 and 4 tasks that are stuck in the ‘queued’ state. The failed tasks are less frequent (approximately 25%)

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:10 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
Squigilumcommented, Dec 11, 2020

Thank you – I won’t be able to try these out until Monday but I’ll let you know what I find.

0reactions
eladkalcommented, Sep 23, 2022

This issue is reported against older version of Airflow. Please check with latest Airflow version.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Airflow 1.9.0 is queuing but not launching tasks - Stack Overflow
Solution: Delete the all the previous DAG runs of the previous DAG-runs with the old name. Restart everything (webserver, worker, executor,...) ...
Read more >
7 Common Errors to Check When Debugging Airflow DAGs
7 Common Errors to Check When Debugging Airflow DAGs. Tasks not running? DAG stuck? Logs nowhere to be found? We've been there.
Read more >
Running tasks in parallel with Swift Concurrency's task groups
In this post, you will learn what Swift Concurrency's task groups are, and how you can use them to concurrently perform a lot...
Read more >
Don't Block the Event Loop (or the Worker Pool) - Node.js
While a Worker with an I/O-intensive task is waiting for its response, it has nothing else to do and can be de-scheduled by...
Read more >
Improve the Performance of Gradle Builds
For more information about parallel builds, check out the parallel builds ... Most built-in tasks provided by Gradle work with incremental build.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found