Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Job becoming stuck in an inactive state when using cluster

See original GitHub issue

This mini program behaves differently on occasion but usually has jobs become stuck in an inactive state. The jobs fail because TTL is exceeded. This program doesn’t always make the jobs become stuck, so if it does not occur just clear out the jobs and restart the program.

When watching the jobs execute through the UI and redis the problem seems to occur more often when there are more inactive jobs than cluster workers available to process them. So for my system that is more than 3 jobs (I have a quad core). I think the problem might be in the worker’s ability to recognize when it able to pick up another job when the job fails based on TTL exceeded instead of an error message.

I am using: Node: v4.4.4 Npm: 2.15.1 Kue: 0.10.5

Please note that this program does not manually exit since I didn’t want to interrupt the job execution.

var kue = require('kue');
var queue = kue.createQueue();
var cluster = require('cluster');
var os = require('os');

if(cluster.isMaster){
    for(var i = 0; i <os.cpus().length -1; i++ ){
        cluster.fork();
        queue.createJob('test').delay(1000*10).ttl(1000*5).on('complete',function () {
            console.log('I am done');
        }).on('enqueue', function () {
            console.log('I have been enqueued');
        }).on('failed', function ( err ) {
            console.log('I have failed');
        }).save(function ( err ) {
            if(err)console.log(err);
        });
    }

    setInterval(function () {
        queue.createJob('test').delay(1000 * 10).ttl(1000 * 5).on('complete', function () {
            console.log('I am done');
        }).on('enqueue', function () {
            console.log('I have been enqueued');
        }).on('failed', function ( err ) {
            console.log('I have failed');
        }).save(function ( err ) {
            if (err)console.log(err);
        });
    }, 1000);
}
else{
    queue.process('test', function ( data, done ) {
        console.log('Processing this job now...?');
    });
}

Issue Analytics

State:
Created 7 years ago
Comments:8

Top GitHub Comments

2reactions

behradcommented, Jun 17, 2016

This is in alpha now and you can test it installing from v1 branch 😃

1reaction

jkrengecommented, Mar 19, 2017

When you’re saying, the worker should handle TTL itself… is this a suitable implementation or just too dirty (and yes I know, domains)?

kue.process('my-job-queue', 1, function (job, done) {

  var ttled = false;
  var domain = require('domain').create();
  domain.on('error', function (err) {
    myApplicationsErrorMonitor(err);
    done(err);
  });

  domain.run(someFunction(job, function(err) {
    if (err) myApplicationsErrorMonitor(err);
    if (!ttled) done(err);
  }));

  setTimeout(function() {
    ttled = true; // <~ should avoid done() from the job function
    myApplicationsErrorMonitor('This timed out');
    done('This timed out');
  }, 60000);

});