question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CPU System Metrics collection

See original GitHub issue

🚀 Feature

Provide CPU profiling similar to GPU and XLA profiling provided by DeviceStatsMonitor. It would be nice if you could specify which device you wanted to profile with DevcieStatsMonitor vs. the profiling defaulting to whatever accelerator you are using.

Motivation

I am running out of CPU memory and I need to figure out where this is occurring. It would be nice if I could easily monitor CPU stats (memory usage, percent utilization, etc).

Pitch

Modify DevcieStatsMonitor to take a device arg that allows you to specify which device to profile. You can then pass multiple DeviceStatsMonitor callbacks to Trainer. The CPU Monitor can use psutil to track common memory attributes.

Alternatives

N/A

Additional context

Also discussed here: https://github.com/PyTorchLightning/pytorch-lightning/issues/9032#issuecomment-943743996

cc @borda @kaushikb11 @awaelchli @justusschock @akihironitta @rohitgr7

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:3
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
tchatoncommented, Jan 5, 2022

@carmocca @awaelchli Any thoughts on this?

@EricWiener Would you have some interest in contributing this feature?

1reaction
ananthsubcommented, Jan 4, 2022

@tchaton that looks reasonable to me!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Collect system metrics with Netdata
Netdata collects thousands of metrics from physical and virtual systems, ... It collects CPU, memory, disks, load, networking, mount points, and more with ......
Read more >
System Check
Get metrics from your base system about the CPU, IO, load, memory, swap, and uptime. The following checks are also system-related:.
Read more >
System metrics
Each resource that can be monitored for performance, availability, reliability, and other attributes has one or more metrics about which data can be...
Read more >
Start page – collectd – The system statistics collection daemon
collectd is a daemon which collects system and application performance metrics periodically and provides mechanisms to store the values in a variety of...
Read more >
CPU System Metrics collection · Issue #11253 · Lightning- ...
Feature Provide CPU profiling similar to GPU and XLA profiling provided by DeviceStatsMonitor. ... CPU System Metrics collection #11253.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found