[Serve] psutil error, possibly memory related
See original GitHub issueWhat is the problem?
When running a lot of replicas, the following psutil traceback occurs:
2020-11-05 16:43:00,476 ERROR worker.py:1057 -- Possible unhandled error from worker: ray::ServeController.report_queue_lengths() (pid=21248, ip=192.168.1.13)
File "/Users/archit/ray/python/ray/thirdparty_files/psutil/_common.py", line 449, in wrapper
ret = self._cache[fun]
AttributeError: 'Process' object has no attribute '_cache'
During handling of the above exception, another exception occurred:
ray::ServeController.report_queue_lengths() (pid=21248, ip=192.168.1.13)
File "/Users/archit/ray/python/ray/thirdparty_files/psutil/_common.py", line 449, in wrapper
ret = self._cache[fun]
AttributeError: _cache
During handling of the above exception, another exception occurred:
ray::ServeController.report_queue_lengths() (pid=21248, ip=192.168.1.13)
File "/Users/archit/ray/python/ray/thirdparty_files/psutil/_psosx.py", line 342, in wrapper
return fun(self, *args, **kwargs)
File "/Users/archit/ray/python/ray/thirdparty_files/psutil/_common.py", line 452, in wrapper
return fun(self)
File "/Users/archit/ray/python/ray/thirdparty_files/psutil/_psosx.py", line 404, in _get_pidtaskinfo
ret = cext.proc_pidtaskinfo_oneshot(self.pid)
PermissionError: [Errno 1] Operation not permitted (originated from proc_pidinfo())
During handling of the above exception, another exception occurred:
ray::ServeController.report_queue_lengths() (pid=21248, ip=192.168.1.13)
File "python/ray/_raylet.pyx", line 439, in ray._raylet.execute_task
File "/Users/archit/ray/python/ray/memory_monitor.py", line 132, in raise_if_low_memory
self.error_threshold))
File "/Users/archit/ray/python/ray/memory_monitor.py", line 42, in get_message
proc_stats.append((get_rss(proc.memory_info()), pid,
File "/Users/archit/ray/python/ray/thirdparty_files/psutil/_common.py", line 452, in wrapper
return fun(self)
File "/Users/archit/ray/python/ray/thirdparty_files/psutil/__init__.py", line 1074, in memory_info
return self._proc.memory_info()
File "/Users/archit/ray/python/ray/thirdparty_files/psutil/_psosx.py", line 342, in wrapper
return fun(self, *args, **kwargs)
File "/Users/archit/ray/python/ray/thirdparty_files/psutil/_psosx.py", line 473, in memory_info
rawtuple = self._get_pidtaskinfo()
File "/Users/archit/ray/python/ray/thirdparty_files/psutil/_psosx.py", line 349, in wrapper
raise AccessDenied(self.pid, self._name)
psutil.AccessDenied: psutil.AccessDenied (pid=0)
If I use fewer replicas (4 instead of 12), there are no errors, so maybe it’s just not handling an out of memory issue gracefully.
Ray version and other system information (Python version, TensorFlow version, OS): Ray master branch and latest wheels, Mac OS, python 3.6, tensorflow 2.3.0, transformers 3.4.0
Reproduction (REQUIRED)
Please provide a script that can be run to reproduce the issue. The script should have no external library dependencies (i.e., use fake or mock data / environments):
import ray
from ray import serve
from ray.serve import BackendConfig
import time
from transformers import pipeline
ray.init()
client = serve.start()
class Servable:
def __init__(self):
self.nlp_model = pipeline("text-generation")
def __call__(self, request):
return self.nlp_model(request.data, max_length=50)
# no errors on my machine if I set num_replicas=4 here:
client.create_backend("generation", Servable, config=BackendConfig(num_replicas=12))
client.create_endpoint("endpoint", backend="generation")
time.sleep(100)
Removing /ray/python/ray/thirdparty-files and running bazel clean --expunge, bazel build //:ray_pkg and pip install -e . --verbose didn’t change anything.
If we cannot run your script, we cannot fix your issue.
- I have verified my script runs in a clean environment and reproduces the issue.
- I have verified the issue also occurs with the latest wheels.
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (6 by maintainers)

Top Related StackOverflow Question
@architkulkarni when you have a chance could you verify that the above PR causes this to OOM more gracefully so we can close this?
Quick example of the
psutilfailure on MacOS: