CPU System Metrics collection
See original GitHub issue🚀 Feature
Provide CPU profiling similar to GPU and XLA profiling provided by DeviceStatsMonitor. It would be nice if you could specify which device you wanted to profile with DevcieStatsMonitor vs. the profiling defaulting to whatever accelerator you are using.
Motivation
I am running out of CPU memory and I need to figure out where this is occurring. It would be nice if I could easily monitor CPU stats (memory usage, percent utilization, etc).
Pitch
Modify DevcieStatsMonitor
to take a device
arg that allows you to specify which device to profile. You can then pass multiple DeviceStatsMonitor
callbacks to Trainer
. The CPU Monitor can use psutil
to track common memory attributes.
Alternatives
N/A
Additional context
Also discussed here: https://github.com/PyTorchLightning/pytorch-lightning/issues/9032#issuecomment-943743996
cc @borda @kaushikb11 @awaelchli @justusschock @akihironitta @rohitgr7
Issue Analytics
- State:
- Created 2 years ago
- Reactions:3
- Comments:9 (9 by maintainers)
@carmocca @awaelchli Any thoughts on this?
@EricWiener Would you have some interest in contributing this feature?
@tchaton that looks reasonable to me!