question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Improve Event Loop Lag Metric

See original GitHub issue

Hello,

Thank you for all the work with this library! We are using our own implementation of the event loop lag metric because the default metric did not work well for us and I was wondering whether this is something we could contribute back? Here are the two things we have changed:

Reset libuv-Histogram The mean/max/percentiles that libuv provides (through perf_hooks.monitorEventLoopDelay) are never reset. After a couple of days all numbers are pretty much set and do not change anymore. There is no way to even distinguish between high load times during the day and quiet times during the night. I think it would be better to expose a moving average/min/max instead of the total-average/min/max.

By default the resolution that libuv uses is 10ms which means that libuv generates ~100 measurements per second. It would be possible to reset libuv’s histogram every second (or so) and record its mean and max in circular buffers. The length of the buffers should be a bit longer than Prometheus’ scape interval and get averaged when collected. Instead of the circular buffer, a histogram could be used as well (proposed in #309). There is some discussion about this in #278.

Subtract resolution from values libuv uses a timer and records the time that has passed since the last invocation. This means that all values that libuv provides have the resolution value “added” to them. I think it would be better if this library could subtract this number before exposing the metrics. This could also be done in Prometheus but this is a rather specific implementation detail and could vary from process to process because it is configurable. There is some discussion about changing how libuv measures the lag but this didn’t get merged: https://github.com/nodejs/node/pull/32018 and https://github.com/nodejs/node/pull/32102. There is also some future work in https://github.com/libuv/libuv/pull/2725.

I am happy to open a PR for this so that you can have a look.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:8
  • Comments:9 (3 by maintainers)

github_iconTop GitHub Comments

6reactions
ChristianBoehlkecommented, Feb 25, 2021

Hello! Sorry I totally forgot about this. I have some time this weekend to open up a PR with our implementation.

4reactions
yarsky-tgzcommented, Sep 9, 2021

Seems that @ChristianBoehlke is too busy to respond on this.

From our side i would like to share experience of using https://github.com/siimon/prom-client/pull/459 fork

We have now exactly that information, that we want to see from beginning:

image

It would be possible to reset libuv’s histogram every second (or so) and record its mean and max in circular buffers. The length of the buffers should be a bit longer than Prometheus’ scape interval and get averaged when collected. Instead of the circular buffer, a histogram could be used as well (proposed in #309).

I still have no idea, why not just use that libuv’s histogram, provided by monitorEventLoopDelay, collect that precalculated stats and reset histogram. No data will be lost such way, and user can see accurate stats for measurement period, as you can see on screenshot above.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Monitoring Node.js: Watch Your Event Loop Lag! - David Hettler
Event loop lag is an essential, but often overlooked performance metric for Node.js applications. What is it and why does it matter?
Read more >
Nodejs Event Loop Lag. Why does it matter? - Medium
Event loop lag is an important but often overlooked performance metric. Event loop lag is the estimation of the time span between the...
Read more >
An Exploration of Runtime Metrics: Node's Event Loop
AppOptics reports the event loop lag. We measure how much longer the event loop takes to complete than the time node reserves for...
Read more >
Introduction to Event Loop Utilization in Node.js - NodeSource
event provider delay: a duration of time starting at the time an event is placed in the event queue and ending when the...
Read more >
Node.js Event Loop and its Metrics: All you need to know
The event loop latency measures how long it additionally takes until a task scheduled with setTimeout(X) really gets processed. A high event ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found