question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ReplyError: ERR Unknown worker 22dded38

See original GitHub issue

Exception

(node:43609) UnhandledPromiseRejectionWarning: ReplyError: ERR Unknown worker 22dded38
    at parseError (/Users/laksh/Desktop/repos/kaiju/node_modules/redis-parser/lib/parser.js:179:12)
    at parseType (/Users/laksh/Desktop/repos/kaiju/node_modules/redis-parser/lib/parser.js:302:14)
node_modules/source-map-support/source-map-support.js:495
(node:43609) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 2)

node.js version: v12.16.2

npm/yarn version: 6.14.11

faktory-server version: 1.4.2

facktory-worker package version: 4.0.1

Code sample:

try {
  this.worker = await this.faktory.work({
    ...config
  })
} catch (error) {
  logger.error({ error }, 'faktory-worker error')
}

Expected Behavior:

Gracefully return a promise rejection in faktory.work()

Actual Behavior:

(node:43609) UnhandledPromiseRejectionWarning: ReplyError: ERR Unknown worker 22dded38
    at parseError (/Users/laksh/Desktop/repos/kaiju/node_modules/redis-parser/lib/parser.js:179:12)
    at parseType (/Users/laksh/Desktop/repos/kaiju/node_modules/redis-parser/lib/parser.js:302:14)
node_modules/source-map-support/source-map-support.js:495
(node:43609) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 2)

Steps to reproduce the behavior or a test case:

  • It’s rather sporadic but I usually start my test env and put a breakpoint and wait for more than a minute to see this error (since this is related to worker heartbeats, if a worker doesn’t beat for more than one minute will make the faktory-server close the worker connection right?)

Looking into Faktory-worker-node issues, I found this in #70

Just covering the bases. There’s a try/catch around worker.js:76, so I’m trying to make sense of the stack trace for the unhandled rejection.

https://github.com/jbielick/faktory_worker_node/blob/99bab78446be27280dc316fa76c6af5d67d4516c/lib/worker.js#L75-L81

I know there are some similar rules around not listening to emitted “error” events, and one is emitted on 79 in the case that fetch or handle fails, so I haven’t ruled that out yet, either.

I am suspecting that the error happens before the worker can start the fetch and handle job loop?

^^ the above is not wrapped in a try/catch block? Could this be the issue?

Error is returned from Faktory server here => https://github.com/contribsys/faktory/blob/b5549cc1cc6cdb19592db69f2da8a8881f69a13d/server/commands.go#L223

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (1 by maintainers)

github_iconTop GitHub Comments

2reactions
jbielickcommented, Mar 29, 2021

Thanks for all the info, y’all. I love the extra detail.

My first impression isn’t super helpful, but it’s important to point out:

I’m suspecting that the worker is stuck in a long-running synchronous job which blocks the event loop from sending a heartbeat until it’s too late.

This is almost definitely what’s happening if you’re getting “evicted” from the server. The heartbeat works via setInterval, and can only execute if the event loop is not blocked. I would definitely check if you’re blocking the event loop in your workers (bad) because this is something that will cause all kinds of trouble.

With that in mind, it would be great for the client to handle this rejection somehow, which I believe is the goal here.

I think the first thing I’ll suggest is upgrading to 4.1.0 or later as there was a try catch introduced around the heartbeat:

    this.heartbeat = setInterval(async () => {
      try {
        await this.beat();
      } catch (error) {
        this.emit(
          "error",
          new Error(`Couldn't send heartbeat to the server: ${error.stack}`)
        );
      }
    }, this.beatInterval);

https://github.com/jbielick/faktory_worker_node/commit/d95b36718625838d1f185cef497896daaad180f3

This will emit an error, so worker.on("error" is still necessary to be notified of this. At the very least, it won’t produce an unhandled rejection.

For the code and the intention behind some of what was referenced:

https://github.com/jbielick/faktory_worker_node/blob/297a82f71bd94adac96d922945957f2e5b1ff4a1/src/worker.ts#L201-L216

beat on line 205 is not wrapped in try catch because the goal there is to ensure a connection to the server is possible, and if not, reject the work() promise immediately before attempting to start working. In other words, if the first beat fails when starting work, reject the promise in the user code that called await faktory.work(). So that one is intentionally not caught.

When an unknown worker err happens, will faktory-server re-establish the connection with this worker? or this worker will be considered dead?

I don’t think faktory_worker_node currently does this. It could be added, but the best case scenario (IMO) is for the worker to shut down when “unknown worker” is returned from the server and letting process supervisors restart the process. I don’t know exactly how re-handshaking with the server looks.

Should we do something on our end to re-establish the connection or create a new worker with faktory-server (using faktory-node-worker)?

I think the worker shutting down is probably the easiest / least complex way to handle this. The underlying issue is a blocked event loop because that can produce an unstable node process, but with proper process supervision it would be very clean to self-destruct (shutdown) and then restart the worker process itself.

I think for that to happen we would need a self-destruct worker.stop() somewhere when receiving ERR from the server or possible more specific ERR unknown worker from the server.

0reactions
Laksh47commented, Mar 26, 2021

Not sure if this is the right fix, but changing parser.ts#L19

returnError: (err) => {
    if(this.listeners('errors').length > 0)
        this.emit('error', err)
    else
        console.warn(err)
}

seems to handle the unhandled exception! I don’t know how this could be properly emitted as an ‘error’ back to the consumers without making the UnhandledPromiseRejectionWarning

@jbielick

Read more comments on GitHub >

github_iconTop Results From Across the Web

Worker crash on socket close · Issue #70 - GitHub
OK, so I can handle all top-level errors with this: function exitOnError (err) { console.log("Faktory worker crashed, exiting", err); process.exit ...
Read more >
Django Channel - Redis integration error : aioredis.errors ...
Downloaded Redis 2.4 from (sourceforge.net/projects/redis) : but got new errorm ,aioredis.errors.ReplyError: ERR unknown command 'BZPOPMIN'.
Read more >
Unknown Worker : r/minerstat - Reddit
miner was running fine and recently my 3090 is not detectable by minerstat anymore. I get unknown worker error. any ideas on what...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found