queue: Queue status output displays incorrect number of workers
See original GitHub issueBug Report
Description
After starting multiple queue task workers with dvc queue start --jobs 4
, the dvc queue status
output displays incorrect number of workers. It first showed 1 active, 0 idle
and then only 0 active, 0 idle
although two tasks are Running
(only two tasks had been queued when starting the workers).
Reproduce
- Start workers with
dvc queue start --jobs 4
- Check output of
dvc queue status
Expected
The sum of active and idle queue task workers should match the number of started workers.
Environment information
Output of dvc doctor
:
$ dvc doctor
DVC version: 2.31.0 (rpm)
---------------------------------
Platform: Python 3.8.3 on Linux-3.10.0-1160.15.2.el7.x86_64-x86_64-with-glibc2.14
Subprojects:
Supports:
azure (adlfs = None, knack = 0.10.0, azure-identity = 1.11.0),
gdrive (pydrive2 = 1.14.0),
gs (gcsfs = None),
hdfs (fsspec = None, pyarrow = 9.0.0),
http (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
https (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
oss (ossfs = 2021.8.0),
s3 (s3fs = None, boto3 = 1.24.59),
ssh (sshfs = 2022.6.0),
webdav (webdav4 = 0.9.7),
webdavs (webdav4 = 0.9.7),
webhdfs (fsspec = None)
Cache types: hardlink, symlink
Cache directory: xfs on /dev/md124
Caches: local, s3
Remotes: s3, s3
Workspace directory: xfs on /dev/md124
Repo: dvc (subdir), git
Additional Information (if any):
Note that I am running DVC inside a Docker container, though it seems this should be irrelevant.
Issue Analytics
- State:
- Created 10 months ago
- Comments:5
Top Results From Across the Web
Workers disappearing without warning/not respawning #607
Currently running 200 workers with supervisord, in five batches. Batches support a total of 25 queues. Worker numbers seem to fluctuate.
Read more >RQ: Workers
A job is popped from any of the given Redis queues. If all queues are empty and the worker is running in burst...
Read more >Thread-Safe Queue in Python
Thread-safe means that it can be used by multiple threads to put and get items concurrently without a race condition. The queue.
Read more >Queue Worker Stops Processing Jobs - Laracasts
I'm regularly running into an issue where my queue worker(s) will just stop running jobs on a tube. There are no errors in...
Read more >Python multiprocessing with an updating queue and an output ...
This code allows to use multiple cores in a process that requires that the queue which feeds data to the workers can be...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Checking the worker processes given the process IDs from
.dvc/tmp/celery/dcv-exp-worker-?.pid
usingps aux | grep <pid>
, I can see that three of the worker processes (2-4) do not actually exist, but only one of the still running task.Maybe the
dvc queue status
output should include a third column for available workers before the maximum limit of allowed workers is reached?Excuse me, what are the other worker’s logs like?
Looks something goes wrong with the celery worker.