question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

PyctcDecode fails when running spawn context (multiprocessing)

See original GitHub issue

I have opened a Transformers PR, yet the PR failed due to changing context from fork to spawn in

from multiprocessing import get_context
.
.
.
pool = get_context("spawn").Pool(num_processes)

in the following file src/transformers/models/wav2vec2_with_lm/processing_wav2vec2_with_lm.py

My question is “How to run pyctcdecode batch decode on WindowsOS?” Is there any way I can help? or contribute to fix that issue?

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:1
  • Comments:5

github_iconTop GitHub Comments

1reaction
lopez86commented, May 25, 2022

I’ve added some code that will allow for either not using multiprocessing or if a spawn context pool is found will bypass multiprocessing as well. It’s not a true solution but it should unblock things at least. If anyone has some way to enable actually using the pool with a spawn context without hurting performance, please discuss or consider making a PR

1reaction
lopez86commented, May 23, 2022

I was looking into this a bit, and what’s happening is that because the language model is included as a class variable, fork works without having to reload the language model, but spawn ends up creating processes where the language model is missing. If the language model is instead made an instance variable, both fork and spawn, but each process needs to reload the language model, which will likely wipe out any performance improvements from using multiprocessing. I’m not sure what a good solution to this would be

Read more comments on GitHub >

github_iconTop Results From Across the Web

Issues · kensho-technologies/pyctcdecode - GitHub
... search decoder for speech recognition. - Issues · kensho-technologies/pyctcdecode. ... PyctcDecode fails when running spawn context (multiprocessing).
Read more >
Python multiprocessing with start method 'spawn' doesn't work
When using the spawn start method, the Process object itself is being pickled for use in the child process. In your code, the...
Read more >
Multiprocessing spawn/forkserver fails to pass Queues
Python fails to pass a Queue when calling Process with multiprocessing.set_start_method set to "spawn" or "forkserver".
Read more >
Wav2Vec2 - Hugging Face
Currently, only pools created with a 'fork' context can be used. If a 'spawn' pool is passed, it will be ignored and sequential...
Read more >
fork() vs. spawn() (intermediate) anthony explains #492
multiprocessing : fork() vs. spawn () (intermediate) anthony explains #492.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found