question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Job becoming stuck in an inactive state when using cluster

See original GitHub issue

This mini program behaves differently on occasion but usually has jobs become stuck in an inactive state. The jobs fail because TTL is exceeded. This program doesn’t always make the jobs become stuck, so if it does not occur just clear out the jobs and restart the program.

When watching the jobs execute through the UI and redis the problem seems to occur more often when there are more inactive jobs than cluster workers available to process them. So for my system that is more than 3 jobs (I have a quad core). I think the problem might be in the worker’s ability to recognize when it able to pick up another job when the job fails based on TTL exceeded instead of an error message.

I am using: Node: v4.4.4 Npm: 2.15.1 Kue: 0.10.5

Please note that this program does not manually exit since I didn’t want to interrupt the job execution.

var kue = require('kue');
var queue = kue.createQueue();
var cluster = require('cluster');
var os = require('os');

if(cluster.isMaster){
    for(var i = 0; i <os.cpus().length -1; i++ ){
        cluster.fork();
        queue.createJob('test').delay(1000*10).ttl(1000*5).on('complete',function () {
            console.log('I am done');
        }).on('enqueue', function () {
            console.log('I have been enqueued');
        }).on('failed', function ( err ) {
            console.log('I have failed');
        }).save(function ( err ) {
            if(err)console.log(err);
        });
    }

    setInterval(function () {
        queue.createJob('test').delay(1000 * 10).ttl(1000 * 5).on('complete', function () {
            console.log('I am done');
        }).on('enqueue', function () {
            console.log('I have been enqueued');
        }).on('failed', function ( err ) {
            console.log('I have failed');
        }).save(function ( err ) {
            if (err)console.log(err);
        });
    }, 1000);
}
else{
    queue.process('test', function ( data, done ) {
        console.log('Processing this job now...?');
    });
}

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:8

github_iconTop GitHub Comments

2reactions
behradcommented, Jun 17, 2016

This is in alpha now and you can test it installing from v1 branch 😃

1reaction
jkrengecommented, Mar 19, 2017

When you’re saying, the worker should handle TTL itself… is this a suitable implementation or just too dirty (and yes I know, domains)?

kue.process('my-job-queue', 1, function (job, done) {

  var ttled = false;
  var domain = require('domain').create();
  domain.on('error', function (err) {
    myApplicationsErrorMonitor(err);
    done(err);
  });

  domain.run(someFunction(job, function(err) {
    if (err) myApplicationsErrorMonitor(err);
    if (!ttled) done(err);
  }));

  setTimeout(function() {
    ttled = true; // <~ should avoid done() from the job function
    myApplicationsErrorMonitor('This timed out');
    done('This timed out');
  }, 60000);

});
Read more comments on GitHub >

github_iconTop Results From Across the Web

Jobs stuck in inactive state · Issue #130 · Automattic/kue - GitHub
It looks like this gets set when a job is saved and the state is set to inactive using lpush q:[type]:jobs 1 ....
Read more >
Stuck inactive incomplete PGs in Ceph - Mastering Proxmox
If any PG is stuck due to OSD or node failure and becomes unhealthy, resulting in the cluster becoming inaccessible due to a...
Read more >
Troubleshoot Amazon ECS tasks stuck in the PENDING state
Some common scenarios that can cause your ECS task to be stuck in the PENDING state include the following: The Docker daemon is...
Read more >
AWS ECS - Task stuck running an inactive task definition
Very often when we update a task we got the old version of the task still running marked as in an inactive state....
Read more >
Background jobs getting stuck in Ready Status | SAP Community
Hi, We are currently experiencing an issue with certain background jobs being scheduled/triggered but then getting stuck in ready status and ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found