question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Considerations for worker metrics in dashboard / performance reports

See original GitHub issue

Motivated by dask/distributed#4494, I’ve been thinking a lot about how worker metrics (including GPU memory/utilization) could be represented in the Distributed dashboard / performance reports; this has culminated in a bunch of PRs playing around with dashboard components and the system monitor on my end, but I realize that since most of that work has only included the same ~5 people, it could be good to open wider discussion on this.

Some of the information desired from #4494 is:

  • information on a worker metric over the course of an operation
  • peak/average value of a metric during an operation

Some questions/thoughts that come to mind:

  • Would time series plots be useful for tracking worker metrics? My first thought was to have something like the dashboard’s System tab for workers, but it would require some progress in Bokeh (bokeh/bokeh#11101) to get working.
  • Where could statistics (mean/max) of a metric be displayed? My first thought was either as a label / hover tool in a plot of the metric, or in a sortable table the the one currently used in the Workers tab.
  • Should GPU metrics stay separate from other metrics? Currently, GPU information is only available in the standalone GPU tab added by dask/distributed#4556, but it shouldn’t be difficult to add it to something like the Workers table or a hypothetical Workers time series tab. One thing to note here is that since GPU info is collected and displayed conditional on a user having pynvml installed, GPU info will show up even if a user isn’t working with GPUs.

Interested to hear more opinions on this.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
mrocklincommented, Apr 2, 2021

Thank you for raising this @charlesbluca . If you’re ok with it I’m going to move this back to the dask/distributed issue tracker.

I’d like to keep the dask/community tracker free of technical feature discussion. The topic that you bring up is important I expect that folks will want to have a lively discussion around it. I’d like to keep this community tracker low enough volume that folks who don’t want to drink from the firehose remain comfortable subscribing here.

0reactions
jacobtomlinsoncommented, Apr 27, 2021

Generally the dashboards give us two things:

  • Reassurance that things are happening. For this aesthetics can win over practicality. For example the task stream is my favourite plot, but the information could be displayed in a more boring but useful way. For example this plot may be a better replacement, but is less visually exciting. But I enjoy running code and seeing it distribute on the cluster and the pleasant design gets newcomes interested. The cluster map plot is similar.
  • Answering specific questions about the cluster. Am I running out of memory? Which parts of my code took the longest to run? Am I spending lots of time spilling to disk? Are my workers compute, memory or IO bound?

Assuming that this discussion is about the latter it may be helpful in this discussion to try and consider that questions users will be trying to answer with the dashboard. Specifically what questions can they not easily answer today. Then with those questions try to work backwards to the plots that we need to create to enable them to be answered.

Some examples:

  • What percentage of time do my workers spend on memory transfer?
  • I have 1000 workers, are they clustered into different bottlenecks (compute, memory, io)?
Read more comments on GitHub >

github_iconTop Results From Across the Web

21 Employee Performance Metrics - AIHR
Discover 21 important employee performance metrics such as NPS, Number of errors, Revenue per employee, 360-degree feedback, and other KPIs!
Read more >
How to Use Employee Performance Dashboards to Inform ...
5 ways HR departments can use employee performance dashboards to assess talent needs & set employee development goals · 1. Analyze the metrics...
Read more >
10 Best Practices for Dashboard Reporting (With Examples)
For example, consider grouping employee churn rates with employee satisfaction rates. Use clear labels on content.
Read more >
What Metrics Should A New HR Dashboard Include?
Training expenses per employee is the total cost of training divided by the number of employees who receive training. When compared with ...
Read more >
Business Performance Dashboard Examples For Management
Our HR dashboard example concentrated on employee performance, offers a host of information on key areas including attendance, productivity, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found