Workers stop processing jobs after Redis reconnect
See original GitHub issueIn production we’re using Amazon Elasticache with BullMQ^1.34.2
We’re finding that in the event of a failover the following error is emitted by the workers UNBLOCKED force unblock from blocking operation, instance state changed (master -> replica?) and workers stop processing jobs. But jobs are still able to be queued.
Currently we have to redeploy our app to rectify this issue. Is there anything we can do to handle this error so that when Redis reconnects it can start processing jobs again? Thanks.
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
Jobs get stuck after Redis reconnect #1873 - OptimalBits/bull
If the worker has processed the job and you can verify that the job is indeed completed, then not calling the complete handler...
Read more >Celery not executing new tasks if redis lost connection is ...
Well task are being processed but only stop executing new task if redis is down then up again. When I start my celery...
Read more >OptimalBits/bull - Gitter
Im trying to iterate over all queues jobs(around 2M), but for some reason Im getting null after the first page values. Maybe Im...
Read more >Class: Resque::Worker - RubyDoc.info
Stop processing jobs after the current one has completed (if we're currently running ... Reconnect to Redis to avoid sharing a connection with...
Read more >Celery workers stop fetching new task after few hours of ...
We've experienced the problem again yesterday. Sudden rise of messages count caused celery to stop fetching tasks (we're still on redis). So upgrading...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

I do not see any issue with the naked eye, it should work.
Yeah, I think i know why this happens. There is a loop inside BullMQ that throws an exception in this case and stops looping. We have a fix in older Bull that I can port to BullMQ that should resolve the issue though.