RuntimeError: Cannot re-initialize CUDA in forked subprocess
See original GitHub issueHow to reproduce
I’m getting the following error while trying to run the example in the getting started document
Process ParallelProcess-1:
Traceback (most recent call last):
File "/usr/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/ubuntu/.venv/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/ubuntu/.venv/lib/python3.9/site-packages/parallelformers/parallel/process.py", line 251, in run
engine = ParallelEngine(
File "/home/ubuntu/.venv/lib/python3.9/site-packages/parallelformers/parallel/engine.py", line 53, in __init__
self.mp_group = self.create_process_group(backend)
File "/home/ubuntu/.venv/lib/python3.9/site-packages/parallelformers/parallel/engine.py", line 106, in create_process_group
torch.cuda.set_device(int(os.getenv("LOCAL_RANK", "0")))
File "/home/ubuntu/.venv/lib/python3.9/site-packages/torch/cuda/__init__.py", line 314, in set_device
torch._C._cuda_setDevice(device)
File "/home/ubuntu/.venv/lib/python3.9/site-packages/torch/cuda/__init__.py", line 207, in _lazy_init
raise RuntimeError(
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
Process ParallelProcess-2:
Traceback (most recent call last):
File "/usr/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/ubuntu/.venv/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/ubuntu/.venv/lib/python3.9/site-packages/parallelformers/parallel/process.py", line 251, in run
engine = ParallelEngine(
File "/home/ubuntu/.venv/lib/python3.9/site-packages/parallelformers/parallel/engine.py", line 53, in __init__
self.mp_group = self.create_process_group(backend)
File "/home/ubuntu/.venv/lib/python3.9/site-packages/parallelformers/parallel/engine.py", line 104, in create_process_group
dist.init_process_group(backend=backend)
File "/home/ubuntu/.venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 627, in init_process_group
_store_based_barrier(rank, store, timeout)
File "/home/ubuntu/.venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 242, in _store_based_barrier
worker_count = store.add(store_key, 0)
RuntimeError: Connection reset by peer
This is my code. I’m running it on a AWS g5.12xlarge instance with 4 GPUs
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neo-2.7B")
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-2.7B")
from parallelformers import parallelize
parallelize(model, num_gpus=2, fp16=True, verbose='detail')
inputs = tokenizer("Parallelformers is", return_tensors="pt")
outputs = model.generate(
**inputs,
num_beams=5,
no_repeat_ngram_size=4,
max_length=15,
)
print(f"Output: {tokenizer.batch_decode(outputs)[0]}")
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A10G On | 00000000:00:1B.0 Off | 0 |
| 0% 29C P8 19W / 300W | 2MiB / 23028MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA A10G On | 00000000:00:1C.0 Off | 0 |
| 0% 29C P8 16W / 300W | 2MiB / 23028MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA A10G On | 00000000:00:1D.0 Off | 0 |
| 0% 29C P8 16W / 300W | 2MiB / 23028MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA A10G On | 00000000:00:1E.0 Off | 0 |
| 0% 30C P8 15W / 300W | 2MiB / 23028MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
I pip installed multiprocess
https://pypi.org/project/multiprocess/ as initially I kept getting importing multiprocess as mp, multiprocess not found
. Then I noticed there was a PR that removed torch.multiprocessing
done by @Oaklight . Maybe I’m not using the right multiprocessing library? Reverting it back to torch.multiprocessing
caused the same error noticed by @Oaklight .
Environment
- OS : Ubuntu
- Python version : 3.9
- Transformers version : 4.21.0
- Whether to use Docker: Nope
- Misc.: Cuda 11.6
Issue Analytics
- State:
- Created a year ago
- Comments:12 (7 by maintainers)
Top Results From Across the Web
Cannot re-initialize CUDA in forked subprocess #40403 - GitHub
When i ran the inference in a GPU, getting the error Cannot re-initialize CUDA in forked subprocess . When i try to set...
Read more >Cannot re-initialize CUDA in forked subprocess - Stack Overflow
I load the model in the parent process and it's accessible to each forked worker process. The problem occurs when creating a CUDA-backed...
Read more >RuntimeError: Cannot re-initialize CUDA in forked subprocess ...
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method.
Read more >Cannot re-initialize CUDA in forked subprocess" Displayed in ...
When PyTorch is used to start multiple processes, the following error message is displayed: RuntimeError: Cannot re-initialize CUDA in forked subprocess ...
Read more >Cannot re-initialize CUDA in forked subprocess和CUDA error ...
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Thanks for reporting. I’ll apply your suggestion. thanks !
Ok the
multiprocess
library is no goodNow if I do
and I put the
__main__
guard it works fine.Suggestion: Remove
mutliprocess as mp
formultiprocessing as mp
and in the instructions inform users to putif __name__ == '__main__':
in the parallelize command.