question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

AggregatorRegistry assumes all workers will have metrics setup

See original GitHub issue

Hello,

thanks for prom-client, it’s a huge time saver for us!

I have run into an issue with the cluster aggregator support. It seems that unless all the workers of a node cluster are setup with prom-client, the aggregator won’t work.

The issue seems to in https://github.com/siimon/prom-client/blob/bb09c6dea7941f54aa7a62fcfa5f052af9d00d5a/lib/cluster.js#L41-L74

We have workers in our node cluster that do not import prom-client for various reasons, hence I need a way to instruct prom-client to not bother contacting them.

I’m happy to provide a PR for this, but wanted to present possible solutions first:

  1. Instead of rejecting the promise when a timeout happens, silently ignore the timeout.
  2. Same as (1) but report non-responsive workers as a separate metric for monitoring.
  3. Instead of sending the GET_METRICS_REQ to all the workers, maintain a list of workers that have setup the listeners and send only to those.

Thoughts?

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:5

github_iconTop GitHub Comments

3reactions
orestiscommented, Mar 27, 2018

I’m trying to think how can this be done in a way that the consumer of the library has control over the initialisation process, so thinking a bit aloud here:

  • The current implementation that contacts all workers has no ordering issue, but suffers when workers are by design non-responsive. However, it still depends on cluster.js being required in all the workers. This is currently done by the top-level index.js, hence require('prom-client') does it.

  • The implementation at #182 expects the AggregatorRegistry to be created in the master before forking, and also that clients require prom-client.

  • I think it’s safe to assume that since workers interested in metrics must require prom-client anyway, I think that indeed setting up the cluster support in workers should be implicit as is now. This way worker code doesn’t have to worry about if it’s running in a cluster or not.

  • The master process of a cluster must explicitly opt-in to using the AggregatorRegistry and exposing the metrics via a web endpoint. Hence the master code is already aware of the cluster in play. In our codebase we try to abstract this away but we do have to check for cluster.isMaster and switch to a different behaviour.

  • The simple approach is to require users of the library to create the AggregatorRegistry before forking (as mentioned already). However, I understand that this will break existing code that expected things to be automatic.

  • Having a discovery phase etc might be a good long term solution that will deal with workers dying. I’ve already added some code that tries to do the right thing but I haven’t tested it thoroughly.

  • Perhaps in the interests of backwards compatibility, this new functionality could be put under some option or some different class altogether e.g. CoordinatedAggregatorRegistry. Users of the current AggregatorRegistry will still get the previous behaviour.

  • I will test a bit more against our codebase and report back.

0reactions
orestiscommented, Mar 27, 2018

I’ve updated the PR, this is now deployed in our staging environment and seems to be ticking along nicely. If the general approach is accepted, I can write some tests covering this new option.

Read more comments on GitHub >

github_iconTop Results From Across the Web

prom-client - Bountysource
Hi! I'm trying to get aggregated metrics for all workers (cluster & worker_threads). My idea is to use exisiting AggregatorRegistry class and pass...
Read more >
Setup Prometheus-Grafana Metrics With PM2 Clusters
I recently had to setup metrics monitoring for a node.js application that I have been running using PM2 clusters. This article provides an ......
Read more >
prom-client/README.md - UNPKG
To solve this, you can aggregate all of the workers' metrics in the ... 25, If you need to expose metrics about an...
Read more >
Mashroom Documentation
All you need to do is to add the appropriate plugins. ... Mashroom Server gathers a lot of internal metrics that can be...
Read more >
Prometheus - swagger-stats
Once we have gathered metrics from every instance, we use promClient.AggregatorRegistry.aggregate() to create the aggregated response and send ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found