Improve Event Loop Lag Metric
See original GitHub issueHello,
Thank you for all the work with this library! We are using our own implementation of the event loop lag metric because the default metric did not work well for us and I was wondering whether this is something we could contribute back? Here are the two things we have changed:
Reset libuv-Histogram
The mean
/max
/percentiles
that libuv provides (through perf_hooks.monitorEventLoopDelay
) are never reset. After a couple of days all numbers are pretty much set and do not change anymore. There is no way to even distinguish between high load times during the day and quiet times during the night. I think it would be better to expose a moving average/min/max instead of the total-average/min/max.
By default the resolution
that libuv uses is 10ms which means that libuv generates ~100 measurements per second. It would be possible to reset libuv’s histogram every second (or so) and record its mean
and max
in circular buffers. The length of the buffers should be a bit longer than Prometheus’ scape interval and get averaged when collected.
Instead of the circular buffer, a histogram could be used as well (proposed in #309).
There is some discussion about this in #278.
Subtract resolution
from values
libuv uses a timer and records the time that has passed since the last invocation. This means that all values that libuv provides have the resolution
value “added” to them. I think it would be better if this library could subtract this number before exposing the metrics. This could also be done in Prometheus but this is a rather specific implementation detail and could vary from process to process because it is configurable.
There is some discussion about changing how libuv measures the lag but this didn’t get merged: https://github.com/nodejs/node/pull/32018 and https://github.com/nodejs/node/pull/32102. There is also some future work in https://github.com/libuv/libuv/pull/2725.
I am happy to open a PR for this so that you can have a look.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:8
- Comments:9 (3 by maintainers)
Hello! Sorry I totally forgot about this. I have some time this weekend to open up a PR with our implementation.
Seems that @ChristianBoehlke is too busy to respond on this.
From our side i would like to share experience of using https://github.com/siimon/prom-client/pull/459 fork
We have now exactly that information, that we want to see from beginning:
I still have no idea, why not just use that libuv’s histogram, provided by
monitorEventLoopDelay
, collect that precalculated stats and reset histogram. No data will be lost such way, and user can see accurate stats for measurement period, as you can see on screenshot above.