question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[QUESTION] interplay with multiprocessing

See original GitHub issue

Describe the bug

I’m trying to run parallel tasks with a timeout per task (using multiprocessing) inside an API method. On trying to terminate the child processes post the time limit, the server process shuts down and disconnects.

To Reproduce

  1. Create a file: repro.py
import os
import time
import uvicorn
from concurrent.futures import ProcessPoolExecutor


def simple_routine(sleep_for):
    print(f"PID {os.getpid()} has sleep time: {sleep_for}")
    time.sleep(sleep_for)
    return "done"


def test_endpoint():
    print(f"main process: {os.getpid()}")

    START_TIME = time.time()
    with ProcessPoolExecutor(max_workers=2) as pool:
        futures = [
            pool.submit(simple_routine, 1), 
            pool.submit(simple_routine, 10), 
        ]

        results = []
        for fut in futures:
            try:
                results.append(fut.result(timeout=2))
            except:
                results.append("not done")

       # terminate the processes which are still running
        for pid, proc in pool._processes.items():
            print("terminating pid ", pid)
            proc.terminate()
    
    print("exiting at: ", int(time.time() - START_TIME))
    return True


async def app(scope, receive, send):
    await send({
        'type': 'http.response.start',
        'status': 200,
        'headers': [
            [b'content-type', b'text/plain'],
        ]
    })
    
    test_endpoint()
    
    await send({
        'type': 'http.response.body',
        'body': b'Hello, world!',
    })


if __name__=="__main__":
    uvicorn.run(app, host="0.0.0.0", port=5000)
  1. Run it as python repro.py.
  2. Open another python interpreter and make this web request.
import requests
for _ in range(20):
    print(requests.get("http://localhost:5000/").text)
  1. The server process shuts down after the first request.

Expected behavior

We start 2 processes one of which exceeds the time limit after which we try terminate it. The server shouldn’t shut down and continue serving requests. Interestingly, the server doesn’t actually exit until the long running process is complete.

INFO:     Started server process [7041]
INFO:     Uvicorn running on http://0.0.0.0:5000 (Press CTRL+C to quit)
INFO:     Waiting for application startup.
INFO:     ASGI 'lifespan' protocol appears unsupported.
INFO:     Application startup complete.
INFO:     127.0.0.1:44954 - "GET / HTTP/1.1" 200 OK
main process: 7041
PID 7060 has sleep time: 1
PID 7061 has sleep time: 10
terminating pid  7060
terminating pid  7061
exiting at:  10
INFO:     Shutting down
INFO:     Finished server process [7041]

With Flask, the behavior of an identical app is as expected.

main process: 1015
PID 1035 has run time: 1
PID 1039 has run time: 1
PID 1038 has run time: 10
terminating pid  1035
terminating pid  1038
terminating pid  1039
exiting at:  2

127.0.0.1 - - [09/Jan/2020 08:51:37] "POST /test-endpoint HTTP/1.1" 200 -

Environment

  • OS: [Ubuntu 18.04.1 LTS]
  • Uvicorn Version: 0.11.1
  • Python version: 3.6.8

Additional context

This came up while trying to port a WSGI application to FastAPI - link. On suggestion of @dmontagu, I tried to reproduce it with starlette and just uvicorn and saw that the error persists.

Hypercorn shows similar behavior in that the application shuts down after serving the first request. So, the issue likely has something to do with how async servers manage processes? Could you please point to where I might look to solve this?

Thank you for looking.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:2
  • Comments:14 (5 by maintainers)

github_iconTop GitHub Comments

7reactions
Mixsercommented, Jun 15, 2022

Hi, @selimb there is an explanation what is happening and why

asyncio setups signal handler in a specific way – it calls signal.set_wakeup_fd and passes an fd of the opened socket. After it, if any signals are sent to the process, they will be written to this socket/fd.

Any child process will inherit not only signal handlers’ behavior but an opened socket. And as a result, when we are sending a signal to the child process, it will be written to the socket and the parent process will receive it too, even though this signal was sent not to him; Or if you will send it to the parent process, the child process will receive this signal too;

How you can avoid this behavior – at the very beginning of the child process you can execute the following code

signal.set_wakeup_fd(-1) # don't send the signal into shared socket

signal.signal(signal.SIGTERM, signal.SIG_DFL) # reset signal handlers to default
signal.signal(signal.SIGINT, signal.SIG_DFL) # reset signal handlers to default

PS. I’ve downloaded an example from https://bugs.python.org/issue43064, and added signal.set_wakeup_fd(-1) to the first line of the worker_sync method. And as a result, I got an expected result (one call for main process, and three calls for child):

[4560] handling signal with asyncio
[4560] main
[4561] worker sleeping...
[4560] 3 procs still alive
[4563] worker sleeping...
[4562] worker sleeping...
[4560] 3 procs still alive
[4560] 3 procs still alive
^C[4563] handle_sig_worker (2, <frame at 0x1012d8220, file '/Users/mike/Work/fast-api-multiprocessing-problems/exp.py', line 77, code worker_sync>) {}
[4562] handle_sig_worker (2, <frame at 0x1012d8220, file '/Users/mike/Work/fast-api-multiprocessing-problems/exp.py', line 77, code worker_sync>) {}
[4561] handle_sig_worker (2, <frame at 0x1012c7d60, file '/Users/mike/Work/fast-api-multiprocessing-problems/exp.py', line 77, code worker_sync>) {}
[4560] handle_sig_main () {}
[4560] 3 procs still alive
[4560] 3 procs still alive
[4561] worker done
[4563] worker done
[4562] worker done
[4560] no procs alive
[4560] main done
3reactions
johnthagencommented, Jun 3, 2021

For FastAPI usage, I solved this by setting multiprocessing to use the "spawn" method in FastAPI’s startup event handler:

import multiprocessing

...
app = FastAPI()


@app.on_event("startup")
def startup_event() -> None:
    multiprocessing.set_start_method("spawn")
Read more comments on GitHub >

github_iconTop Results From Across the Web

Python multiprocessing Pool Interaction With Namespace At ...
④ As I understand the arguments are passed to the slave processes by pickling, but how does pool pass the called function to...
Read more >
How to Use the Multiprocessing Package in Python
Multiprocessing is quintessential when a long-running process has to be speeded up or multiple processes have to execute parallelly.
Read more >
Developers - [QUESTION] interplay with multiprocessing -
Coming soon: A brand new website interface for an even better experience!
Read more >
Multiprocessing vs. Threading in Python: What Every Data ...
This deep dive on Python parallelization libraries - multiprocessing and threading - will explain which to use when for different data ...
Read more >
Speed Up Your Python Program With Concurrency
What Is Parallelism? ; Cooperative multitasking ( asyncio ), The tasks decide when to give up control. 1 ; Multiprocessing ( multiprocessing ),...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found