question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

gunicorn workers are being killed if used in background/async processes

See original GitHub issue

I have a gunicorn + flask app where I use librosa for wav-preprocessing (preprocess_wav from resemblyzer lib). There’re files up to 5Mb. And everything seems working fine if processing the input request synchronously, so after the request came, it processes and the client gets some response after the processing finishes. But since working with audio requires time, we decided to move it to a async way, using rabbit. So the requests are coming to a consumer and then it processes in a background. What I’ve noticed is that my gunicorn workers are being restarted (~ 1…5 times per hour, not all of them, only one at a time). A lot of debugging showed me that it always happens on a preprocess_wav function and the only one thing that happend there is librosa.resample is being called.

What my question is and what my guess is, maybe there’s some resources or some data could be left after the processing is finished, that differs from the sync processing where all resources belong to a request and clear after the request ends? Some details about the situation: I can’t reproduce it on a local machine (even in a Docker-container), it happens only on a test machine running in a docker-container, probably because there’s a constant traffic of request (about 100 requests / hour).

To Reproduce

import librosa
from resemblyzer import preprocess_wav
##
wav, source_sr = librosa.load(file, sr=None, duration=n)
return wav, source_sr
##
preprocess_wav(wav, sr) # worker is restating here

**Expected behavior**
App is working without being rebooted

Software versions

Linux-5.13.0-39-generic-x86_64-with-glibc2.10
Python 3.8.5 (default, Sep  4 2020, 07:30:14) 
[GCC 7.3.0]
NumPy 1.19.2
SciPy 1.5.2
librosa 0.8.1
INSTALLED VERSIONS
------------------
python: 3.8.5 (default, Sep  4 2020, 07:30:14) 
[GCC 7.3.0]
librosa: 0.8.1
audioread: 2.1.9
numpy: 1.19.2
scipy: 1.5.2
sklearn: 0.23.2
joblib: 0.17.0
decorator: 4.4.2
soundfile: 0.10.3
resampy: 0.2.2
numba: 0.51.2
numpydoc: 1.1.0
sphinx: 3.2.1
sphinx_rtd_theme: None
sphinxcontrib.versioning: None
sphinx-gallery: None
pytest: 6.1.1
pytest-mpl: None
pytest-cov: None
matplotlib: 3.3.2
presets: None

Addtitional context I tried to call gc.collect() after the message finishes processing without effect I don’t use librosa caching This usually happens with a file 4+Mb, smaller files don’t cause problems. I would be very thankful in any help

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:9 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
frankiedrakecommented, Apr 26, 2022

@bmcfee So I tried the solution you suggested, but the problem hasn’t gone. At the same time, adding more debug points show that worker restarts at the method trim_long_silences from resemlyzer lib. And, what’s interesting, that the worker called this method continues (!) to work, restarting the other worker. So my guess is as follows:

  • worker1 do some job calling trim_long_silences (usually it’s 3 times per 1 task)
  • worker2 starts doing its job, calls trim_long_silences and at this time worker2 needs some resources (probably some blocking I/O, CPU actions) that are still occupied by worker1 and worker1 is being killed, releasing resources for worker2 and worker2 continues to work.

I don’t have much experience on investigating such problems, but I’ll try to figure it out

1reaction
frankiedrakecommented, Jun 22, 2022

@frankiedrake congrats … it didn’t sound easy at all to me! relaxed maybe share a link to your code repository here? that will benefit future users

Actually this change only affects gunicorn startup config, previously the worker type was ommited and the default one (sync) was used. By the end everything I changed was specifying the worker type with gunicorn -k eventlet ...<other config>

Read more comments on GitHub >

github_iconTop Results From Across the Web

Design — Gunicorn 20.1.0 documentation
Gunicorn is based on the pre-fork worker model. This means that there is a central master process that manages a set of worker...
Read more >
Async worker on gunicorn seems blocking - Stack Overflow
It behaves as is it is a synchronous worker, one request at a time. Is this the normal behavior? Am I missing something...
Read more >
Python agent and Gunicorn WSGI web server
With the background thread then being subsequently killed when worker processes are forked, no data will be reported for the actual web application....
Read more >
Gunicorn Documentation - Read the Docs
Gunicorn Documentation, Release 19.1.0. Gunicorn 'Green Unicorn' is a Python WSGI HTTP Server for UNIX. It's a pre-fork worker model ported ...
Read more >
Better performance by optimizing Gunicorn config - Medium
The role of the master process is to make sure that the number of workers is the same as the ones defined in...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found