question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Deadlock in _python_exit

See original GitHub issue

This can be reproduced in joblib by uncommenting a test case:

Here is the traceback we get when running ctrl-c on the frozen pytest:

$ pytest -xk test_nested_parallel_warnings
==================================================================== test session starts ====================================================================
platform darwin -- Python 3.10.6, pytest-7.1.3, pluggy-1.0.0
rootdir: /Users/ogrisel/code/joblib, configfile: setup.cfg, testpaths: joblib
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, timeout-2.1.0
collecting ... /Users/ogrisel/code/joblib/joblib/executor.py:108: PytestCollectionWarning: cannot collect test class '_TestingMemmappingExecutor' because it has a __init__ constructor (from: joblib/test/test_memmapping.py)
  class _TestingMemmappingExecutor(MemmappingExecutor):
collected 1225 items / 1219 deselected / 2 skipped / 6 selected                                                                                             

joblib/test/test_parallel.py ......                                                                                                                   [100%]

======================================================= 6 passed, 2 skipped, 1219 deselected in 2.24s =======================================================
[DEBUG:MainProcess:MainThread] Interpreter shutting down. Waking up executor_manager_thread [(<_ExecutorManagerThread(ExecutorManagerThread, started 6172438528)>, (<unlocked _thread.lock object at 0x1042a82c0>, <joblib.externals.loky.process_executor._ThreadWakeup object at 0x10441c1c0>))]
[DEBUG:MainProcess:ExecutorManagerThread] closing call_queue
[DEBUG:MainProcess:ExecutorManagerThread] telling queue thread to quit
[DEBUG:MainProcess:ExecutorManagerThread] Queue.join_thread()
[DEBUG:MainProcess:ExecutorManagerThread] closing result_queue
[DEBUG:MainProcess:ExecutorManagerThread] closing thread_wakeup
[DEBUG:MainProcess:ExecutorManagerThread] joining processes
[DEBUG:MainProcess:QueueFeederThread] feeder thread got sentinel -- exiting
^CException ignored in: <module 'threading' from '/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/threading.py'>
Traceback (most recent call last):
  File "/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/threading.py", line 1537, in _shutdown
    atexit_call()
  File "/Users/ogrisel/code/joblib/joblib/externals/loky/process_executor.py", line 193, in _python_exit
    thread.join()
  File "/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/threading.py", line 1096, in join
    self._wait_for_tstate_lock()
  File "/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/threading.py", line 1116, in _wait_for_tstate_lock
    if lock.acquire(block, timeout):
KeyboardInterrupt: 
[INFO:MainProcess:MainThread] process shutting down
[DEBUG:MainProcess:MainThread] running all "atexit" finalizers with priority >= 0
[INFO:MainProcess:MainThread] calling join() for process LokyProcess-1
[INFO:MainProcess:MainThread] calling join() for process LokyProcess-2
[DEBUG:MainProcess:MainThread] running the remaining "atexit" finalizers

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:14 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
lestevecommented, Sep 12, 2022

My feeling is that it could well be the same issue.

Another variation I tried: On the main branch, edit joblib/test_parallel.py and uncomment the nested parallel test with loky nested within loky. I run only the loky within loky test, the test pass but I get a timeout at shutdown, something similar to https://github.com/joblib/loky/issues/363#issuecomment-1240887687.

❯ pytest joblib/test/test_parallel.py -v -k 'nested_parallel_warning and loky-loky'
==================================================================== test session starts ====================================================================
platform linux -- Python 3.9.13, pytest-7.1.3, pluggy-1.0.0 -- /home/lesteve/miniconda3/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.9.13', 'Platform': 'Linux-5.15.0-47-generic-x86_64-with-glibc2.35', 'Packages': {'pytest': '7.1.3', 'py': '1.11.0', 'pluggy': '1.0.0'}, 'Plugins': {'anyio': '3.4.0', 'forked': '1.4.0', 'metadata': '2.0.1', 'cov': '3.0.0', 'asyncio': '0.16.0', 'json-report': '1.5.0', 'xdist': '2.5.0'}}
rootdir: /home/lesteve/dev/joblib, configfile: setup.cfg
plugins: anyio-3.4.0, forked-1.4.0, metadata-2.0.1, cov-3.0.0, asyncio-0.16.0, json-report-1.5.0, xdist-2.5.0
collected 393 items / 392 deselected / 1 selected                                                                                                           

joblib/test/test_parallel.py::test_nested_parallel_warnings[loky-loky-False] PASSED                                                                   [100%]

============================================================= 1 passed, 392 deselected in 3.51s =============================================================
[DEBUG:MainProcess:MainThread] Interpreter shutting down. Waking up executor_manager_thread [(<_ExecutorManagerThread(ExecutorManagerThread, started 140602304767552)>, (<unlocked _thread.lock object at 0x7fe063e7aa80>, <joblib.externals.loky.process_executor._ThreadWakeup object at 0x7fe063e7af70>))]
[DEBUG:MainProcess:ExecutorManagerThread] closing call_queue
[DEBUG:MainProcess:ExecutorManagerThread] telling queue thread to quit
[DEBUG:MainProcess:ExecutorManagerThread] Queue.join_thread()
[DEBUG:MainProcess:QueueFeederThread] feeder thread got sentinel -- exiting
[DEBUG:MainProcess:ExecutorManagerThread] closing result_queue
[DEBUG:MainProcess:ExecutorManagerThread] closing thread_wakeup
[DEBUG:MainProcess:ExecutorManagerThread] joining processes
Timeout (0:01:00)!
Thread 0x00007fe0866e2640 (most recent call first):
  File "/home/lesteve/dev/joblib/joblib/externals/loky/backend/popen_loky_posix.py", line 56 in poll
  File "/home/lesteve/dev/joblib/joblib/externals/loky/backend/popen_loky_posix.py", line 77 in wait
  File "/home/lesteve/miniconda3/lib/python3.9/multiprocessing/process.py", line 149 in join
  File "/home/lesteve/dev/joblib/joblib/externals/loky/process_executor.py", line 819 in join_executor_internals
  File "/home/lesteve/dev/joblib/joblib/externals/loky/process_executor.py", line 564 in run
  File "/home/lesteve/miniconda3/lib/python3.9/threading.py", line 980 in _bootstrap_inner
  File "/home/lesteve/miniconda3/lib/python3.9/threading.py", line 937 in _bootstrap

Thread 0x00007fe069756640 (most recent call first):
  File "/home/lesteve/miniconda3/lib/python3.9/concurrent/futures/thread.py", line 81 in _worker
  File "/home/lesteve/miniconda3/lib/python3.9/threading.py", line 917 in run
  File "/home/lesteve/miniconda3/lib/python3.9/threading.py", line 980 in _bootstrap_inner
  File "/home/lesteve/miniconda3/lib/python3.9/threading.py", line 937 in _bootstrap

Thread 0x00007fe08ccb3740 (most recent call first):
  File "/home/lesteve/miniconda3/lib/python3.9/threading.py", line 1080 in _wait_for_tstate_lock
  File "/home/lesteve/miniconda3/lib/python3.9/threading.py", line 1060 in join
  File "/home/lesteve/dev/joblib/joblib/externals/loky/process_executor.py", line 193 in _python_exit
  File "/home/lesteve/miniconda3/lib/python3.9/threading.py", line 1447 in _shutdown

On your debug-loky-deadlock-in-joblib branch, the shutdown does not time out, but I get lingering loky processes for some reason. Looking at https://github.com/joblib/joblib/compare/master...ogrisel:joblib:debug-loky-deadlock-in-joblib it only looks like additional debug code, so not sure what is happening.

1reaction
lestevecommented, Sep 9, 2022

Maybe this is OSX-specific, I can not reproduce the issue on my Linux machine:

❯ pytest joblib/test/test_parallel.py -k nested_parallel_warning -v
============================================================================================================ test session starts ============================================================================================================
platform linux -- Python 3.9.13, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 -- /home/lesteve/miniconda3/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.9.13', 'Platform': 'Linux-5.15.0-47-generic-x86_64-with-glibc2.35', 'Packages': {'pytest': '6.2.5', 'py': '1.11.0', 'pluggy': '1.0.0'}, 'Plugins': {'anyio': '3.4.0', 'forked': '1.4.0', 'metadata': '2.0.1', 'cov': '3.0.0', 'asyncio': '0.16.0', 'json-report': '1.5.0', 'xdist': '2.5.0'}}
rootdir: /home/lesteve/dev/joblib, configfile: setup.cfg
plugins: anyio-3.4.0, forked-1.4.0, metadata-2.0.1, cov-3.0.0, asyncio-0.16.0, json-report-1.5.0, xdist-2.5.0
collected 388 items / 387 deselected / 1 selected                                                                                                                                                                                           

joblib/test/test_parallel.py::test_nested_parallel_warnings[loky-loky-False] PASSED                                                                                                                                                   [100%]

===================================================================================================== 1 passed, 387 deselected in 2.95s =====================================================================================================
[DEBUG:MainProcess:MainThread] Interpreter shutting down. Waking up executor_manager_thread [(<_ExecutorManagerThread(ExecutorManagerThread, started 140435047106112)>, (<unlocked _thread.lock object at 0x7fb976c96f90>, <joblib.externals.loky.process_executor._ThreadWakeup object at 0x7fb976c3f5e0>))]
[DEBUG:MainProcess:ExecutorManagerThread] releasing worker exit lock on 1320771: LokyProcess-1
[DEBUG:MainProcess:ExecutorManagerThread] releasing worker exit lock on 1320772: LokyProcess-2
[DEBUG:MainProcess:ExecutorManagerThread] found 2 processes to stop
[DEBUG:MainProcess:ExecutorManagerThread] sent 2 sentinels to the call queue
[DEBUG:MainProcess:ExecutorManagerThread] closing call_queue
[DEBUG:MainProcess:ExecutorManagerThread] telling queue thread to quit
[DEBUG:MainProcess:ExecutorManagerThread] Queue.join_thread()
[DEBUG:MainProcess:QueueFeederThread] feeder thread got sentinel -- exiting
[DEBUG:MainProcess:ExecutorManagerThread] closing result_queue
[DEBUG:MainProcess:ExecutorManagerThread] closing thread_wakeup
[DEBUG:MainProcess:ExecutorManagerThread] joining 2 processes
[DEBUG:MainProcess:ExecutorManagerThread] joining process 1320771: LokyProcess-1
[DEBUG:MainProcess:ExecutorManagerThread] joined process 1320771: LokyProcess-1
[DEBUG:MainProcess:ExecutorManagerThread] joining process 1320772: LokyProcess-2
[DEBUG:MainProcess:ExecutorManagerThread] joined process 1320772: LokyProcess-2
[DEBUG:MainProcess:ExecutorManagerThread] executor management thread clean shutdown of worker processes: [1320771, 1320772]
[DEBUG:MainProcess:MainThread] Successfully deleted /dev/shm/joblib_memmapping_folder_1320763_a2813182f81947a4b81141fcd858f043_8fdec611b3e14f8a8d7cd5e07070fcce
[DEBUG:MainProcess:MainThread] Successfully deleted /dev/shm/joblib_memmapping_folder_1320763_a7f2a043e6f1478f89f82ddcfda4c2a3_bd11a0b27a244a6a858aa8ea46cf3120
[INFO:MainProcess:MainThread] process shutting down
[DEBUG:MainProcess:MainThread] running all "atexit" finalizers with priority >= 0
[DEBUG:MainProcess:MainThread] running the remaining "atexit" finalizers
Read more comments on GitHub >

github_iconTop Results From Across the Web

How to Identify a Deadlock in Python
A deadlock is a concurrency failure mode where a thread or threads wait for a condition that never occurs. The result is that...
Read more >
Python | Locking without Deadlocks - GeeksforGeeks
This article focuses on dealing with how to get more than one lock at a time if a multithread program is given along...
Read more >
SystemExit deadlock with multiprocessing + asyncio
The solution/workaround seems to be to make the worker a daemon process so that _exit_function() will explicitly terminate it before p.join() , ...
Read more >
Debugging python deadlocks - Praveen Venkatesh
Sometimes, a python script just gives up and hangs. Even Ctrl-C does not exit the program. You do not have to have a...
Read more >
Deadlock in multiprocessing.pool.Pool on terminate
Finally, processes enter into a deadlock when parent join()-s workers. This is tricky to reproduce but can definitely happen.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found