PyctcDecode fails when running spawn context (multiprocessing)
See original GitHub issueI have opened a Transformers PR, yet the PR failed due to changing context
from fork
to spawn
in
from multiprocessing import get_context
.
.
.
pool = get_context("spawn").Pool(num_processes)
in the following file src/transformers/models/wav2vec2_with_lm/processing_wav2vec2_with_lm.py
My question is “How to run pyctcdecode batch decode
on WindowsOS?”
Is there any way I can help? or contribute to fix that issue?
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:5
Top Results From Across the Web
Issues · kensho-technologies/pyctcdecode - GitHub
... search decoder for speech recognition. - Issues · kensho-technologies/pyctcdecode. ... PyctcDecode fails when running spawn context (multiprocessing).
Read more >Python multiprocessing with start method 'spawn' doesn't work
When using the spawn start method, the Process object itself is being pickled for use in the child process. In your code, the...
Read more >Multiprocessing spawn/forkserver fails to pass Queues
Python fails to pass a Queue when calling Process with multiprocessing.set_start_method set to "spawn" or "forkserver".
Read more >Wav2Vec2 - Hugging Face
Currently, only pools created with a 'fork' context can be used. If a 'spawn' pool is passed, it will be ignored and sequential...
Read more >fork() vs. spawn() (intermediate) anthony explains #492
multiprocessing : fork() vs. spawn () (intermediate) anthony explains #492.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’ve added some code that will allow for either not using multiprocessing or if a spawn context pool is found will bypass multiprocessing as well. It’s not a true solution but it should unblock things at least. If anyone has some way to enable actually using the pool with a spawn context without hurting performance, please discuss or consider making a PR
I was looking into this a bit, and what’s happening is that because the language model is included as a class variable, fork works without having to reload the language model, but spawn ends up creating processes where the language model is missing. If the language model is instead made an instance variable, both fork and spawn, but each process needs to reload the language model, which will likely wipe out any performance improvements from using multiprocessing. I’m not sure what a good solution to this would be