Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Feature: Create a metrics view

See original GitHub issue

We want to display a metrics view/page in the IPFS desktop app and webui. Ideally, this view would allow users to view any/all metrics they need, without having to access the CLI or other tools. This view would not have tools or complicated features that would allow users to process those metrics. Instead, they would want to use the Diagnostics View.

Original issue description

@olizilla suggested we could add a metrics tab somewhere in the Web UI where we could show the data from ${apiAddr}/debug/metrics/prometheus graphically. That would certainly be an interesting idea and useful for some kinds of users.

Leaving this as a WIP Issue

Agreed upon metrics

These metrics are ones that we’ve decided we will implement.

Metric	Where do we get it	Requires change to Kubo/libp2p/etc?	Code sample	Notes

Possible metrics

These metrics have strong arguments and use-cases and need to be discussed to decide whether they are useful enough to surface

Metric	Where do we get it	Requires change to Kubo/libp2p/etc?	Code sample	Notes
downloadedSize	stats.bitswap	No	`const { dataReceived } = await getIpfs().stats.bitswap();`	Discussion at https://github.com/ipfs/ipfs-webui/pull/1942
sharedSize	stats.bitswap	No	`const { dataSent } = await getIpfs().stats.bitswap();`	Discussion at https://github.com/ipfs/ipfs-webui/pull/1942
dialable	kubo client/server mode	Yes	TBD	This metric is a surfacing of whether a node is in server/client (serving/leeching) mode. We should be able to infer this, but needs analysis.

Disqualified metrics

These metrics will not be included in the metrics view for one reason or another.

Metric	Where do we get it	Code sample	Notes

Looking for community and IPFS Implementers’ feedback on this issue.

Here are some questions to help get the ideas flowing.

Which metrics should we focus on?

What metrics do you currently use/view/monitor and how do you obtain them?
- Would it benefit you to see those metrics graphed in the Webui/desktop app?
- Which of those metrics are difficult to get access to or discover?
  - i.e. Metrics that require comprehensive calculations, metrics that you seem to always forget how to obtain, metrics that require a complicated process to obtain, etc.
- What metrics do you desire but are not currently obtainable? Why do you need them (usecase)?
  - Local node uptime/downtime - To monitor stability of my node because I need my home ipfs node to always be avail
  - Local node activity (global reqs, responses, latency, etc…) - To monitor how much my node is used (just curious)?
  - Total peers connected over time - To monitor the health of my ipfs network
  - count and frequency of requests that timeout - To denylist/allowlist certain good/bad peers?
  - etc…

What/Why/How

What problems do you currently have that a specific metrics view in the webui and IPFS desktop could help you solve?
Do you need customizable metrics views beyond selecting a time window?
What kind of charts & graphs would benefit you most?
Would you use the metrics view instead of your current tool of choice if the metrics were available in the IPFS desktop and webui?

Issue Analytics

State:
Created 4 years ago
Reactions:3
Comments:7 (7 by maintainers)

Top GitHub Comments

1reaction

olizillacommented, Apr 22, 2022

On my way to work this morning I started day-dreaming about how nice it would be to get back to working on developer tools for IPFS & co. A dangerous move as I was riding my bike at the time, but lo! What serendipitous timing! I’d love to pitch in on this!

That wall of text is the only format for getting that particular list of data points. It’s intended for consumption by Prometheus, the timeseries db, but it’s well standardised and much tooling exists for it. The numbers are all point-in-time measurements, intended to be scraped periodically, so it’s not such a bad fit for a local dev tool that is already in the habit of polling the api for info.

There are some useful metrics in there that we use all the time now, to check if a node is working well, things like bitswap queue length, but in general much of the metrics are super specific, and only really of use to a developer who is making specific changes to a specific subsystem, so we should build up a short list of metrics that are worth putting the spotlight on, and then give experts a way to get to the kitchen sink.

1reaction

lidelcommented, Apr 21, 2022

My quick take is:

we want to provide ipfs-webui (go-ipfs, ipfs-desktop) users with a quick view for exploring metrics without the need for setting up Expression Browser or Grafana.
rationale: text is fine for basic count, but having useful visualization of histogram metrics makes it way easier to answer questions like:
- is my gateway slow to respond, or the problem reported by the user is just an outlier? if so, how fast is 95th percent of responses?
- what is the most popular gateway response type? which request type takes the most of time?
- how long most of quic/tcp connections last?
- is there a visible degradation in the duration of my datastores’ reads/writes?
- etc

So in practice, we can’t hardcode or assume too much:

Figure out how/where to add “Diagnostic” screen to ipfs-webui
- Make it, so it can have multiple tabs (we want metrics, but we will also add logs, and other diagnostic tooling there over time, so good to create foundation for that from the start)
“Metrics” tab should be a generic solution
- Show it only when debug/metrics/prometheus endpoint is available, and provide UI for exploring metrics listed there.
- Dynamic discovery: assume new metrics can be added over time, and old metrics can be removed or deprecated, so we can’t expect anything to stay the same.
- Good news is that this wall of text is becoming a standard(?):
  1. https://prometheus.io/docs/instrumenting/exposition_formats/#openmetrics-text-format
  2. https://github.com/OpenObservability/OpenMetrics
  3. Given that each metric has standard TYPE like counter and histogram and unique name and description defined in HELP lines, we should be able to generate UI for each (if there is no JS tooling for this format already – needs research)