gunicorn workers are being killed if used in background/async processes
See original GitHub issueI have a gunicorn
+ flask
app where I use librosa
for wav-preprocessing (preprocess_wav from resemblyzer
lib). There’re files up to 5Mb. And everything seems working fine if processing the input request synchronously, so after the request came, it processes and the client gets some response after the processing finishes. But since working with audio requires time, we decided to move it to a async way, using rabbit. So the requests are coming to a consumer and then it processes in a background. What I’ve noticed is that my gunicorn workers are being restarted (~ 1…5 times per hour, not all of them, only one at a time). A lot of debugging showed me that it always happens on a preprocess_wav
function and the only one thing that happend there is librosa.resample
is being called.
What my question is and what my guess is, maybe there’s some resources or some data could be left after the processing is finished, that differs from the sync processing where all resources belong to a request and clear after the request ends? Some details about the situation: I can’t reproduce it on a local machine (even in a Docker-container), it happens only on a test machine running in a docker-container, probably because there’s a constant traffic of request (about 100 requests / hour).
To Reproduce
import librosa
from resemblyzer import preprocess_wav
##
wav, source_sr = librosa.load(file, sr=None, duration=n)
return wav, source_sr
##
preprocess_wav(wav, sr) # worker is restating here
**Expected behavior**
App is working without being rebooted
Software versions
Linux-5.13.0-39-generic-x86_64-with-glibc2.10
Python 3.8.5 (default, Sep 4 2020, 07:30:14)
[GCC 7.3.0]
NumPy 1.19.2
SciPy 1.5.2
librosa 0.8.1
INSTALLED VERSIONS
------------------
python: 3.8.5 (default, Sep 4 2020, 07:30:14)
[GCC 7.3.0]
librosa: 0.8.1
audioread: 2.1.9
numpy: 1.19.2
scipy: 1.5.2
sklearn: 0.23.2
joblib: 0.17.0
decorator: 4.4.2
soundfile: 0.10.3
resampy: 0.2.2
numba: 0.51.2
numpydoc: 1.1.0
sphinx: 3.2.1
sphinx_rtd_theme: None
sphinxcontrib.versioning: None
sphinx-gallery: None
pytest: 6.1.1
pytest-mpl: None
pytest-cov: None
matplotlib: 3.3.2
presets: None
Addtitional context
I tried to call gc.collect()
after the message finishes processing without effect
I don’t use librosa caching
This usually happens with a file 4+Mb, smaller files don’t cause problems.
I would be very thankful in any help
Issue Analytics
- State:
- Created a year ago
- Comments:9 (4 by maintainers)
Top GitHub Comments
@bmcfee So I tried the solution you suggested, but the problem hasn’t gone. At the same time, adding more debug points show that worker restarts at the method
trim_long_silences
fromresemlyzer
lib. And, what’s interesting, that the worker called this method continues (!) to work, restarting the other worker. So my guess is as follows:worker1
do some job callingtrim_long_silences
(usually it’s 3 times per 1 task)worker2
starts doing its job, callstrim_long_silences
and at this timeworker2
needs some resources (probably some blocking I/O, CPU actions) that are still occupied byworker1
andworker1
is being killed, releasing resources forworker2
andworker2
continues to work.I don’t have much experience on investigating such problems, but I’ll try to figure it out
Actually this change only affects
gunicorn
startup config, previously the worker type was ommited and the default one (sync
) was used. By the end everything I changed was specifying the worker type withgunicorn -k eventlet ...<other config>