Ungraceful shutdown of python script can leave orphan processes
See original GitHub issue🐛 Bug
CompilerGym uses a client/service architecture. Every time a CompilerEnv
object is created, a CompilerService subprocess is started. The lifetime of the subprocess is managed by the CompilerEnv
. Calling CompilerEnv.close()
terminates the service:
If for some reason CompilerEnv.close()
is not called (either through a system or user error), the CompilerService will not be killed and will remain dormant indefinitely.
To Reproduce
In one terminal, open a python interpreter and start a CompilerGym environment. Make a note of the interpreter and the environment’s service process IDs:
In [1]: import os
In [2]: os.getpid()
Out[2]: 5425
In [3]: import compiler_gym
In [4]: env = compiler_gym.make("llvm-v0")
In [5]: env.service.connection.process
Out[5]: <subprocess.Popen at 0x7fbe10855790>
In [6]: env.service.connection.process.pid
Out[6]: 5809
In another terminal, kill the interpreter process, and observe that the CompilerGym environment’s service is still running:
$ kill -9 5425
$ ps aux | grep 5809
cummins 6087 0.0 0.0 4408696 864 s002 S+ 1:16PM 0:00.00 grep --color=auto 5809
cummins 5809 0.0 0.0 4499680 14628 s000 S 1:15PM 0:00.02 ./compiler_gym-llvm-service --working_dir=/Users/cummins/.cache/compiler_gym/s/0720T131545-167414-6660
That process will remain dormant until explicitly killed, or the machine is rebooted.
Expected behavior
After some period of inactivity, the service should realize that it is no longer being used and should gracefully shutdown.
To the best of my understanding, it is not possible to guarantee that a subprocess shutdown routine can be called by the parent process, so the proposed workaround is to have a ‘time to live’ timer on each subprocess which will shut itself down if that period of inactivity is reached.
Workaround
If you suspect that there are dormant LLVM CompilerGym services and you are not currently running any CompilerGym python scripts, you can manually kill them using:
ps aux | grep compiler_gym-llvm-service | grep -v grep | awk '{print $2}' | xargs --no-run-if-empty kill
although this does not tidy up any temporary cache files that the environments have created.
Environment
Please fill in this checklist:
- CompilerGym: v0.1.9
- How you installed CompilerGym (conda, pip, source): n/a
- OS: n/a
- Python version: n/a
Additional context
See the documentation for more background on CompilerGym’s client/service architecture.
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (6 by maintainers)
Top GitHub Comments
Yes, those are safe to kill. Those processes are cBench binaries which are used to compute
runtime
. If they refuse to complete within a specified timeout (default 5 min), the process will be abandoned. Adding proper subprocess timeout would be a nice feature. The code is here:https://github.com/facebookresearch/CompilerGym/blob/development/compiler_gym/util/Subprocess.cc#L62-L66
Cheers, Chris
I noticed this problem when I implemented an algorithm that
env.fork()
a lot and sometimes I forgot toenv.close()
or the program is interrupted. What I really want to address this problem (indirectly) is to have a context manager for compiler environment class.e.g.,
contextlib.closing
can partly do this job already, but it would be nice to have native support. If we have such support, it should be the recommended way of managing compiler gym environments, likewith open(...) as ...
is THE way of opening files in Python.