Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

gunicorn workers are being killed if used in background/async processes

See original GitHub issue

I have a gunicorn + flask app where I use librosa for wav-preprocessing (preprocess_wav from resemblyzer lib). There’re files up to 5Mb. And everything seems working fine if processing the input request synchronously, so after the request came, it processes and the client gets some response after the processing finishes. But since working with audio requires time, we decided to move it to a async way, using rabbit. So the requests are coming to a consumer and then it processes in a background. What I’ve noticed is that my gunicorn workers are being restarted (~ 1…5 times per hour, not all of them, only one at a time). A lot of debugging showed me that it always happens on a preprocess_wav function and the only one thing that happend there is librosa.resample is being called.

What my question is and what my guess is, maybe there’s some resources or some data could be left after the processing is finished, that differs from the sync processing where all resources belong to a request and clear after the request ends? Some details about the situation: I can’t reproduce it on a local machine (even in a Docker-container), it happens only on a test machine running in a docker-container, probably because there’s a constant traffic of request (about 100 requests / hour).

To Reproduce

import librosa
from resemblyzer import preprocess_wav
##
wav, source_sr = librosa.load(file, sr=None, duration=n)
return wav, source_sr
##
preprocess_wav(wav, sr) # worker is restating here

**Expected behavior**
App is working without being rebooted

Software versions

Linux-5.13.0-39-generic-x86_64-with-glibc2.10
Python 3.8.5 (default, Sep  4 2020, 07:30:14) 
[GCC 7.3.0]
NumPy 1.19.2
SciPy 1.5.2
librosa 0.8.1
INSTALLED VERSIONS
------------------
python: 3.8.5 (default, Sep  4 2020, 07:30:14) 
[GCC 7.3.0]
librosa: 0.8.1
audioread: 2.1.9
numpy: 1.19.2
scipy: 1.5.2
sklearn: 0.23.2
joblib: 0.17.0
decorator: 4.4.2
soundfile: 0.10.3
resampy: 0.2.2
numba: 0.51.2
numpydoc: 1.1.0
sphinx: 3.2.1
sphinx_rtd_theme: None
sphinxcontrib.versioning: None
sphinx-gallery: None
pytest: 6.1.1
pytest-mpl: None
pytest-cov: None
matplotlib: 3.3.2
presets: None

Addtitional context I tried to call gc.collect() after the message finishes processing without effect I don’t use librosa caching This usually happens with a file 4+Mb, smaller files don’t cause problems. I would be very thankful in any help

Issue Analytics

State:
Created a year ago
Comments:9 (4 by maintainers)

Top GitHub Comments

2reactions

frankiedrakecommented, Apr 26, 2022

@bmcfee So I tried the solution you suggested, but the problem hasn’t gone. At the same time, adding more debug points show that worker restarts at the method trim_long_silences from resemlyzer lib. And, what’s interesting, that the worker called this method continues (!) to work, restarting the other worker. So my guess is as follows:

worker1 do some job calling trim_long_silences (usually it’s 3 times per 1 task)
worker2 starts doing its job, calls trim_long_silences and at this time worker2 needs some resources (probably some blocking I/O, CPU actions) that are still occupied by worker1 and worker1 is being killed, releasing resources for worker2 and worker2 continues to work.

I don’t have much experience on investigating such problems, but I’ll try to figure it out

1reaction

frankiedrakecommented, Jun 22, 2022

@frankiedrake congrats … it didn’t sound easy at all to me! relaxed maybe share a link to your code repository here? that will benefit future users

Actually this change only affects gunicorn startup config, previously the worker type was ommited and the default one (sync) was used. By the end everything I changed was specifying the worker type with gunicorn -k eventlet ...<other config>

Top Results From Across the Web

Design — Gunicorn 20.1.0 documentation

Gunicorn is based on the pre-fork worker model. This means that there is a central master process that manages a set of worker...

Async worker on gunicorn seems blocking - Stack Overflow

It behaves as is it is a synchronous worker, one request at a time. Is this the normal behavior? Am I missing something...

Python agent and Gunicorn WSGI web server

With the background thread then being subsequently killed when worker processes are forked, no data will be reported for the actual web application....

Gunicorn Documentation - Read the Docs

Gunicorn Documentation, Release 19.1.0. Gunicorn 'Green Unicorn' is a Python WSGI HTTP Server for UNIX. It's a pre-fork worker model ported ...

Better performance by optimizing Gunicorn config - Medium

The role of the master process is to make sure that the number of workers is the same as the ones defined in...