Flow + JobId usage creates "zombie" jobs
See original GitHub issueThe issue happens when doing a backfill, where flow jobs are used to make sure one job runs after the other.
In this situation, I don’t care about the output of the children jobs, flow jobs are only used a synchronization tool.
In the code below, first Monday and Tuesday are queued and they both execute.
Later, when Tuesday and Wednesday try to execute, Wednesday won’t run because Tuesday has finished and it won’t trigger any events.
Even after deleting Tuesday, Wednesday won’t run.
And later, if I try to schedule Thursday and Wednesday, neither runs because Wednesday is in a zombie state which never runs.
import { Queue } from "bullmq";
import { delay, FlowProducer } from "bullmq";
import { Worker } from "bullmq";
const queueName = "queue" + Math.random();
new Worker(queueName, async (job) => {
console.log("working...", job.name);
});
const main = async () => {
const flowProducer = new FlowProducer();
const queue = new Queue(queueName);
await flowProducer.add({
queueName,
name: "tue",
opts: {
jobId: "tue",
},
children: [
{
name: "mon",
queueName,
opts: {
jobId: "mon",
},
},
],
});
await delay(100);
// console.log: working... mon
// console.log: working... tue
// wed will never run because tue has finished
await flowProducer.add({
queueName,
name: "wed",
opts: {
jobId: "wed",
},
children: [
{
name: "tue",
queueName,
opts: {
jobId: "tue",
},
},
],
});
// after removing tue, wed won't run
await queue.remove("tue");
await delay(100);
// any job that depends on wed won't run either
await flowProducer.add({
queueName,
name: "thu",
opts: {
jobId: "thu",
},
children: [
{
name: "wed",
queueName,
opts: {
jobId: "wed",
},
},
],
});
};
main();
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (1 by maintainers)
Top Results From Across the Web
How to remove Zombie jobs - IBM
Zombie jobs are caused by that sbatchd is down on the execution host and jobs are killed after that. Answer. Firstly run bkill...
Read more >Azure Automation: Get job output not returning Content
I've created a Flow that creates an Azure Automation job via HTTP. I then take the JobID returned by this and attempt to...
Read more >Life of a Dataproc job - Google Cloud
Dataproc jobs flow; Job concurrency; Job monitoring and debugging. View Job logs in Logging; Determining who submitted a job; Error Messages.
Read more >Dataflow zombie jobs - stucked during job update
In the issue, mention your project and Job ID, so we can find the job easily. But please do not file other support...
Read more >Running Jobs with IBM Spectrum LSF
Queues do not correspond to individual hosts; each queue can use all server hosts ... LSF assigns each job a unique job ID...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

@lucasavila00 Yes, so as @roggervalf already explained this is currently working as designed, since jobs using the same Id as existing jobs in the queue (in any status) are ignored.
I am not sure why you want to specify custom jobIds, but my guess is that you want to be able to continually add jobs to a exiting flow, that sounds like a legitimate case to me.
So in order to allow for this use case, we need to make some changes. I think it would be enough if we update the parent dependents statuses if the child job with the custom id has already completed, and if not we just ignore the job as we are doing now. That implies that in your case above your “tue” job will not be re-processed, just ignored as now but the parent will be processed.
There may be edge cases that we need to figure out though, but I think that as a principle it should work.
hi @lucasavila00, I could dig a little bit in this case, so as the tue job was added in the first flow, if you add it in the second one https://github.com/taskforcesh/bullmq/blob/master/src/commands/addJob-8.lua#L66-L70 here the same job id is returned and no more logic would be processed, so it won’t be added again and no parent id would be added to that job the second time that you try to add it, wed would be in waiting state and so on, this is the reason of your zombie jobs, either way I would like to have @manast opinion about this