question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Serve] psutil error, possibly memory related

See original GitHub issue

What is the problem?

When running a lot of replicas, the following psutil traceback occurs:

2020-11-05 16:43:00,476 ERROR worker.py:1057 -- Possible unhandled error from worker: ray::ServeController.report_queue_lengths() (pid=21248, ip=192.168.1.13)
  File "/Users/archit/ray/python/ray/thirdparty_files/psutil/_common.py", line 449, in wrapper
    ret = self._cache[fun]
AttributeError: 'Process' object has no attribute '_cache'

During handling of the above exception, another exception occurred:

ray::ServeController.report_queue_lengths() (pid=21248, ip=192.168.1.13)
  File "/Users/archit/ray/python/ray/thirdparty_files/psutil/_common.py", line 449, in wrapper
    ret = self._cache[fun]
AttributeError: _cache

During handling of the above exception, another exception occurred:

ray::ServeController.report_queue_lengths() (pid=21248, ip=192.168.1.13)
  File "/Users/archit/ray/python/ray/thirdparty_files/psutil/_psosx.py", line 342, in wrapper
    return fun(self, *args, **kwargs)
  File "/Users/archit/ray/python/ray/thirdparty_files/psutil/_common.py", line 452, in wrapper
    return fun(self)
  File "/Users/archit/ray/python/ray/thirdparty_files/psutil/_psosx.py", line 404, in _get_pidtaskinfo
    ret = cext.proc_pidtaskinfo_oneshot(self.pid)
PermissionError: [Errno 1] Operation not permitted (originated from proc_pidinfo())

During handling of the above exception, another exception occurred:

ray::ServeController.report_queue_lengths() (pid=21248, ip=192.168.1.13)
  File "python/ray/_raylet.pyx", line 439, in ray._raylet.execute_task
  File "/Users/archit/ray/python/ray/memory_monitor.py", line 132, in raise_if_low_memory
    self.error_threshold))
  File "/Users/archit/ray/python/ray/memory_monitor.py", line 42, in get_message
    proc_stats.append((get_rss(proc.memory_info()), pid,
  File "/Users/archit/ray/python/ray/thirdparty_files/psutil/_common.py", line 452, in wrapper
    return fun(self)
  File "/Users/archit/ray/python/ray/thirdparty_files/psutil/__init__.py", line 1074, in memory_info
    return self._proc.memory_info()
  File "/Users/archit/ray/python/ray/thirdparty_files/psutil/_psosx.py", line 342, in wrapper
    return fun(self, *args, **kwargs)
  File "/Users/archit/ray/python/ray/thirdparty_files/psutil/_psosx.py", line 473, in memory_info
    rawtuple = self._get_pidtaskinfo()
  File "/Users/archit/ray/python/ray/thirdparty_files/psutil/_psosx.py", line 349, in wrapper
    raise AccessDenied(self.pid, self._name)
psutil.AccessDenied: psutil.AccessDenied (pid=0)

If I use fewer replicas (4 instead of 12), there are no errors, so maybe it’s just not handling an out of memory issue gracefully.

Ray version and other system information (Python version, TensorFlow version, OS): Ray master branch and latest wheels, Mac OS, python 3.6, tensorflow 2.3.0, transformers 3.4.0

Reproduction (REQUIRED)

Please provide a script that can be run to reproduce the issue. The script should have no external library dependencies (i.e., use fake or mock data / environments):

import ray
from ray import serve
from ray.serve import BackendConfig
import time
from transformers import pipeline

ray.init()
client = serve.start()

class Servable:
    def __init__(self):
        self.nlp_model = pipeline("text-generation")
    def __call__(self, request):
        return self.nlp_model(request.data, max_length=50)

# no errors on my machine if I set num_replicas=4 here:
client.create_backend("generation", Servable, config=BackendConfig(num_replicas=12)) 

client.create_endpoint("endpoint", backend="generation")
time.sleep(100)

Removing /ray/python/ray/thirdparty-files and running bazel clean --expunge, bazel build //:ray_pkg and pip install -e . --verbose didn’t change anything.

If we cannot run your script, we cannot fix your issue.

  • I have verified my script runs in a clean environment and reproduces the issue.
  • I have verified the issue also occurs with the latest wheels.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
edoakescommented, May 28, 2021

@architkulkarni when you have a chance could you verify that the above PR causes this to OOM more gracefully so we can close this?

1reaction
clarkzinzowcommented, May 27, 2021

Quick example of the psutil failure on MacOS:

In [1]: import psutil

In [2]: p = psutil.Process(psutil.pids()[10])

In [3]: p.username()
Out[5]: 'root'

In [4]: p.memory_info()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/usr/local/lib/python3.9/site-packages/psutil/_common.py in wrapper(self)
    446             # case 1: we previously entered oneshot() ctx
--> 447             ret = self._cache[fun]
    448         except AttributeError:

AttributeError: 'Process' object has no attribute '_cache'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
/usr/local/lib/python3.9/site-packages/psutil/_common.py in wrapper(self)
    446             # case 1: we previously entered oneshot() ctx
--> 447             ret = self._cache[fun]
    448         except AttributeError:

AttributeError: _cache

During handling of the above exception, another exception occurred:

PermissionError                           Traceback (most recent call last)
/usr/local/lib/python3.9/site-packages/psutil/_psosx.py in wrapper(self, *args, **kwargs)
    343         try:
--> 344             return fun(self, *args, **kwargs)
    345         except ProcessLookupError:

/usr/local/lib/python3.9/site-packages/psutil/_common.py in wrapper(self)
    449             # case 2: we never entered oneshot() ctx
--> 450             return fun(self)
    451         except KeyError:

/usr/local/lib/python3.9/site-packages/psutil/_psosx.py in _get_pidtaskinfo(self)
    405         with catch_zombie(self):
--> 406             ret = cext.proc_pidtaskinfo_oneshot(self.pid)
    407         assert len(ret) == len(pidtaskinfo_map)

PermissionError: [Errno 1] Operation not permitted (originated from proc_pidinfo())

During handling of the above exception, another exception occurred:

AccessDenied                              Traceback (most recent call last)
<ipython-input-7-1cc9c5f33ec8> in <module>
----> 1 p.memory_info()

/usr/local/lib/python3.9/site-packages/psutil/_common.py in wrapper(self)
    448         except AttributeError:
    449             # case 2: we never entered oneshot() ctx
--> 450             return fun(self)
    451         except KeyError:
    452             # case 3: we entered oneshot() ctx but there's no cache

/usr/local/lib/python3.9/site-packages/psutil/__init__.py in memory_info(self)
   1052         All numbers are expressed in bytes.
   1053         """
-> 1054         return self._proc.memory_info()
   1055
   1056     @_common.deprecated_method(replacement="memory_info")

/usr/local/lib/python3.9/site-packages/psutil/_psosx.py in wrapper(self, *args, **kwargs)
    342     def wrapper(self, *args, **kwargs):
    343         try:
--> 344             return fun(self, *args, **kwargs)
    345         except ProcessLookupError:
    346             if is_zombie(self.pid):

/usr/local/lib/python3.9/site-packages/psutil/_psosx.py in memory_info(self)
    473     @wrap_exceptions
    474     def memory_info(self):
--> 475         rawtuple = self._get_pidtaskinfo()
    476         return pmem(
    477             rawtuple[pidtaskinfo_map['rss']],

/usr/local/lib/python3.9/site-packages/psutil/_psosx.py in wrapper(self, *args, **kwargs)
    349                 raise NoSuchProcess(self.pid, self._name)
    350         except PermissionError:
--> 351             raise AccessDenied(self.pid, self._name)
    352         except cext.ZombieProcessError:
    353             raise ZombieProcess(self.pid, self._name, self._ppid)

AccessDenied: psutil.AccessDenied (pid=71)
Read more comments on GitHub >

github_iconTop Results From Across the Web

psutil documentation — psutil 5.9.5 documentation
psutil (python system and process utilities) is a cross-platform library for retrieving information on running processes and system utilization (CPU, memory ...
Read more >
python - psutil shows I have >250GB RAM available, yet I'm ...
Using python with a 6.5GB dataset on a server that has hundreds of GB of RAM (confirmed with psutil ). I'm getting memory...
Read more >
Monitoring memory usage of a running Python program
Monitoring memory usage of a running Python program · Method 1: Using Tracemalloc. Tracemalloc is a library module that traces every memory block ......
Read more >
memory-profiler - PyPI
This is a python module for monitoring memory consumption of a process as well as line-by-line analysis of memory consumption for python programs....
Read more >
psutil Documentation - Read the Docs
and system utilization (CPU, memory, disks, network, sensors) in Python. It is useful mainly for system monitoring,.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found