Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

expose transfer bytes total (count) rather than just gauge of current value

See original GitHub issue

Currently the worker Prometheus endpoint exposes transfer_incoming_bytes and transfer_outgoing_bytes as gauges—i.e., the current value at a single point in time.

A better way to expose this sort of data is as a monotonically increasing count metric type (this should be exposed as transfer_incoming_bytes_total and transfer_outgoing_bytes_total).

It’s easy to get rate from an accumulated count, but you can’t get accurate count from a sampled rate.

Issue Analytics

State:
Created a year ago
Comments:6 (4 by maintainers)

Top GitHub Comments

1reaction

ntabriscommented, Nov 25, 2022

I think we don’t expose actually bytes transferred over time, i.e., what @crusaderky calls “cumulative” values in https://github.com/dask/distributed/pull/6936#issuecomment-1230524443

That’s what I was asking for. If it’s not high-value, feel free to ignore for now though.

For context, host metrics can tell us how much data moves in/out of each worker. What it can’t exactly tell us (at least not easily) is how much of that is transfer vs data moving into/out of cluster (e.g., S3). I think it would be nice if Dask could tell us how many bytes of host network traffic is for transfer.

0reactions

gjoseph92commented, Dec 9, 2022

I would also find this useful for benchmarking. Total amount of data transferred is a useful metric to compare when working on changes to scheduling.

Top Results From Across the Web

expose transfer bytes total (count) rather than just gauge of current ...

Currently the worker Prometheus endpoint exposes transfer_incoming_bytes and transfer_outgoing_bytes as gauges—i.e., the current value at a single point in time ...

Cloud SQL metrics | Cloud SQL for PostgreSQL - Google Cloud

Total RAM usage in bytes. This metric reports the RAM usage of the database process, including the buffer/cache. Sampled every 60 seconds. After...

Interpreting Prometheus metrics for Linux disk I/O utilization

This interprets the same underlying diskstats , and it's enlightening to see how it does so. The first set of stats you'll see...

Runtime metrics | Docker Documentation

Reads and writes are merged in a single counter. Indicates the number of bytes read and written by the cgroup. It has 4...

Visualizing observability with Kibana: Event rates and ... - Elastic

A gauge is a snapshot in time of a value, it goes up and it goes down, ... it increments the counter with...