question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Crashes in version 1.0: on psutil.Process.cpu_percent, when process is gone

See original GitHub issue

Describe the bug

Upgraded from version 0.6.0 which has worked fine for a long time to 1.0 and the program crashes regularly with an exit code of 1 and no error messages. The time of the crashes is random, from a few minutes to almost an hour. Also initialized a new virtual env but the problem still occurs, and switching back to 0.6.0 stops the crashes.

Screenshots or Program Output

$ gpustat --debug
> An error while retrieving `fan_speed`: Not Supported
Traceback (most recent call last):
  File "path/to/home/dir/.venv/lib64/python3.10/site-packages/gpustat/core.py", line 425, in get_gpu_info
    fan_speed = N.nvmlDeviceGetFanSpeed(handle)
  File "path/to/home/dir/.venv/lib64/python3.10/site-packages/pynvml.py", line 1942, in nvmlDeviceGetFanSpeed
    _nvmlCheckReturn(ret)
  File "path/to/home/dir/.venv/lib64/python3.10/site-packages/pynvml.py", line 765, in _nvmlCheckReturn
    raise NVMLError(ret)
pynvml.NVMLError_NotSupported: Not Supported

> An error while retrieving `power_limit`: Not Supported
Traceback (most recent call last):
  File "path/to/home/dir/.venv/lib64/python3.10/site-packages/gpustat/core.py", line 461, in get_gpu_info
    power_limit = N.nvmlDeviceGetEnforcedPowerLimit(handle)
  File "path/to/home/dir/.venv/lib64/python3.10/site-packages/pynvml.py", line 2025, in nvmlDeviceGetEnforcedPowerLimit
    _nvmlCheckReturn(ret)
  File "path/to/home/dir/.venv/lib64/python3.10/site-packages/pynvml.py", line 765, in _nvmlCheckReturn
    raise NVMLError(ret)
pynvml.NVMLError_NotSupported: Not Supported

> An error while retrieving `fan_speed`: Not Supported -> Total 1 occurrences.
> An error while retrieving `power_limit`: Not Supported -> Total 1 occurrences.
$ nvidia-smi
Fri Dec  2 13:31:01 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.56.06    Driver Version: 520.56.06    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| N/A   44C    P5    20W /  N/A |   1000MiB /  6144MiB |     21%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

Environment information:

  • OS: Fedora 36
  • NVIDIA Driver version: 520.56.06
  • The name(s) of GPU card: NVIDIA GeForce RTX 3060 Laptop GPU
  • gpustat version: 1.0.0
  • pynvml version:
nvidia-ml-py                      11.495.46
nvidia-ml-py3                     7.352.0

Command used: gpustat -cp --watch

Issue Analytics

  • State:closed
  • Created 10 months ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
ItsABlackScreencommented, Dec 6, 2022

So i went ahead and inserted the print statements, but still there were no error messages and just an exit code of 1. I realized that the terminal was probably being cleared in the process by the --watch option, so i just dumped everything to a log file and finally have the error messages.

This is from after the last successful run till the process crashed

Error on querying NVIDIA devices. Use --debug flag to see more details.
process no longer exists (pid=3949264)

Traceback (most recent call last):
  File "/path/to/home/dir/gpustat/.venv/lib64/python3.10/site-packages/psutil/_common.py", line 443, in wrapper
    ret = self._cache[fun]
AttributeError: 'Process' object has no attribute '_cache'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/path/to/home/dir/gpustat/.venv/lib64/python3.10/site-packages/psutil/_pslinux.py", line 1645, in wrapper
    return fun(self, *args, **kwargs)
  File "/path/to/home/dir/gpustat/.venv/lib64/python3.10/site-packages/psutil/_common.py", line 446, in wrapper
    return fun(self)
  File "/path/to/home/dir/gpustat/.venv/lib64/python3.10/site-packages/psutil/_pslinux.py", line 1687, in _parse_stat_file
    data = bcat("%s/%s/stat" % (self._procfs_path, self.pid))
  File "/path/to/home/dir/gpustat/.venv/lib64/python3.10/site-packages/psutil/_common.py", line 776, in bcat
    return cat(fname, fallback=fallback, _open=open_binary)
  File "/path/to/home/dir/gpustat/.venv/lib64/python3.10/site-packages/psutil/_common.py", line 764, in cat
    with _open(fname) as f:
  File "/path/to/home/dir/gpustat/.venv/lib64/python3.10/site-packages/psutil/_common.py", line 728, in open_binary
    return open(fname, "rb", buffering=FILE_READ_BUFFER_SIZE)
FileNotFoundError: [Errno 2] No such file or directory: '/proc/3949264/stat'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/path/to/home/dir/gpustat/.venv/lib64/python3.10/site-packages/gpustat/cli.py", line 58, in print_gpustat
    gpu_stats = GPUStatCollection.new_query(debug=debug)
  File "/path/to/home/dir/gpustat/.venv/lib64/python3.10/site-packages/gpustat/core.py", line 600, in new_query
    gpu_info = get_gpu_info(handle)
  File "/path/to/home/dir/gpustat/.venv/lib64/python3.10/site-packages/gpustat/core.py", line 563, in get_gpu_info
    process["cpu_percent"] = cache_process.cpu_percent()
  File "/path/to/home/dir/gpustat/.venv/lib64/python3.10/site-packages/psutil/__init__.py", line 999, in cpu_percent
    pt2 = self._proc.cpu_times()
  File "/path/to/home/dir/gpustat/.venv/lib64/python3.10/site-packages/psutil/_pslinux.py", line 1645, in wrapper
    return fun(self, *args, **kwargs)
  File "/path/to/home/dir/gpustat/.venv/lib64/python3.10/site-packages/psutil/_pslinux.py", line 1836, in cpu_times
    values = self._parse_stat_file()
  File "/path/to/home/dir/gpustat/.venv/lib64/python3.10/site-packages/psutil/_pslinux.py", line 1652, in wrapper
    raise NoSuchProcess(self.pid, self._name)
psutil.NoSuchProcess: process no longer exists (pid=3949264)
GetCount
GetHandleByIndex
GetName
GetUUID
GetTemperature
GetFanSpeed
GetMemoryInfo
GetUtilizationRates
GetEncoderUtilization
GetDecoderUtilization
GetPowerUsage
GetEnforcedPowerLimit
GetComputeRunningProcess
GetGraphiceRunningProcesses

On three different runs the crash was exactly from the same reason. Also to note that the v0.6 process has been running fine the entire time (2 days and some change).

0reactions
ItsABlackScreencommented, Dec 13, 2022

@wookayin There have been no crashes since the fix, the issue is now resolved.

Read more comments on GitHub >

github_iconTop Results From Across the Web

psutil documentation — psutil 5.9.5 documentation
psutil (python system and process utilities) is a cross-platform library for retrieving information on running processes and system utilization (CPU, ...
Read more >
Cannot get_cpu_percent() when run as Admin on Win7 · Issue #161 ...
1. Create ProcWrapper.py as a class and put the following into the class as a method def get_current_processes(self): processes = [] for process...
Read more >
Python: timed-out psutil process killed (as instructed) in ...
I randomly change some bytes in valid PDF files, and then test to see if any of the 'fuzzed' files crash any of...
Read more >
fx-team - Mercurial - Mozilla
#26: psutil.process_iter() function to iterate over processes as - Process ... Metadata-Version: 1.1 Name: psutil -Version: 1.0.1 -Summary: A process and ...
Read more >
psutil 1.0.1 - PyPI
A process and system utilities module for Python. ... iowait=1.5, irq=0.0, softirq=0.0, steal=0.0, guest=0.0, guest_nice=0.0) cpupercent(user=1.0, nice=0.0, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found