Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RuntimeError: Cannot re-initialize CUDA in forked subprocess

See original GitHub issue

How to reproduce

I’m getting the following error while trying to run the example in the getting started document

Process ParallelProcess-1:
Traceback (most recent call last):
  File "/usr/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/ubuntu/.venv/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/.venv/lib/python3.9/site-packages/parallelformers/parallel/process.py", line 251, in run
    engine = ParallelEngine(
  File "/home/ubuntu/.venv/lib/python3.9/site-packages/parallelformers/parallel/engine.py", line 53, in __init__
    self.mp_group = self.create_process_group(backend)
  File "/home/ubuntu/.venv/lib/python3.9/site-packages/parallelformers/parallel/engine.py", line 106, in create_process_group
    torch.cuda.set_device(int(os.getenv("LOCAL_RANK", "0")))
  File "/home/ubuntu/.venv/lib/python3.9/site-packages/torch/cuda/__init__.py", line 314, in set_device
    torch._C._cuda_setDevice(device)
  File "/home/ubuntu/.venv/lib/python3.9/site-packages/torch/cuda/__init__.py", line 207, in _lazy_init
    raise RuntimeError(
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
Process ParallelProcess-2:
Traceback (most recent call last):
  File "/usr/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/ubuntu/.venv/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/.venv/lib/python3.9/site-packages/parallelformers/parallel/process.py", line 251, in run
    engine = ParallelEngine(
  File "/home/ubuntu/.venv/lib/python3.9/site-packages/parallelformers/parallel/engine.py", line 53, in __init__
    self.mp_group = self.create_process_group(backend)
  File "/home/ubuntu/.venv/lib/python3.9/site-packages/parallelformers/parallel/engine.py", line 104, in create_process_group
    dist.init_process_group(backend=backend)
  File "/home/ubuntu/.venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 627, in init_process_group
    _store_based_barrier(rank, store, timeout)
  File "/home/ubuntu/.venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 242, in _store_based_barrier
    worker_count = store.add(store_key, 0)
RuntimeError: Connection reset by peer

This is my code. I’m running it on a AWS g5.12xlarge instance with 4 GPUs

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neo-2.7B")
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-2.7B")

from parallelformers import parallelize

parallelize(model, num_gpus=2, fp16=True, verbose='detail')

inputs = tokenizer("Parallelformers is", return_tensors="pt")

outputs = model.generate(
    **inputs,
    num_beams=5,
    no_repeat_ngram_size=4,
    max_length=15,
)

print(f"Output: {tokenizer.batch_decode(outputs)[0]}")

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A10G         On   | 00000000:00:1B.0 Off |                    0 |
|  0%   29C    P8    19W / 300W |      2MiB / 23028MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A10G         On   | 00000000:00:1C.0 Off |                    0 |
|  0%   29C    P8    16W / 300W |      2MiB / 23028MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA A10G         On   | 00000000:00:1D.0 Off |                    0 |
|  0%   29C    P8    16W / 300W |      2MiB / 23028MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA A10G         On   | 00000000:00:1E.0 Off |                    0 |
|  0%   30C    P8    15W / 300W |      2MiB / 23028MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

I pip installed multiprocess https://pypi.org/project/multiprocess/ as initially I kept getting importing multiprocess as mp, multiprocess not found. Then I noticed there was a PR that removed torch.multiprocessing done by @Oaklight . Maybe I’m not using the right multiprocessing library? Reverting it back to torch.multiprocessing caused the same error noticed by @Oaklight .

Environment

OS : Ubuntu
Python version : 3.9
Transformers version : 4.21.0
Whether to use Docker: Nope
Misc.: Cuda 11.6

Issue Analytics

State:
Created a year ago
Comments:12 (7 by maintainers)

Top GitHub Comments

1reaction

hyunwoongkocommented, Jul 27, 2022

Thanks for reporting. I’ll apply your suggestion. thanks !

1reaction

cabal-danielcommented, Jul 27, 2022

Ok the multiprocess library is no good

(.venv) ubuntu@ip-172-31-7-113:~$ python opt_multi.py 
Process ParallelProcess-1:
Traceback (most recent call last):
  File "/usr/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/ubuntu/.venv/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/.venv/lib/python3.9/site-packages/parallelformers/parallel/process.py", line 251, in run
    engine = ParallelEngine(
  File "/home/ubuntu/.venv/lib/python3.9/site-packages/parallelformers/parallel/engine.py", line 53, in __init__
    self.mp_group = self.create_process_group(backend)
  File "/home/ubuntu/.venv/lib/python3.9/site-packages/parallelformers/parallel/engine.py", line 106, in create_process_group
    torch.cuda.set_device(int(os.getenv("LOCAL_RANK", "0")))
  File "/home/ubuntu/.venv/lib/python3.9/site-packages/torch/cuda/__init__.py", line 314, in set_device
    torch._C._cuda_setDevice(device)
  File "/home/ubuntu/.venv/lib/python3.9/site-packages/torch/cuda/__init__.py", line 207, in _lazy_init
    raise RuntimeError(
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
Process ParallelProcess-3:
Process ParallelProcess-4:
Process ParallelProcess-2:
Traceback (most recent call last):
  File "/usr/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/ubuntu/.venv/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/.venv/lib/python3.9/site-packages/parallelformers/parallel/process.py", line 251, in run
    engine = ParallelEngine(
  File "/home/ubuntu/.venv/lib/python3.9/site-packages/parallelformers/parallel/engine.py", line 53, in __init__
    self.mp_group = self.create_process_group(backend)
  File "/home/ubuntu/.venv/lib/python3.9/site-packages/parallelformers/parallel/engine.py", line 104, in create_process_group
    dist.init_process_group(backend=backend)
  File "/home/ubuntu/.venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 627, in init_process_group
    _store_based_barrier(rank, store, timeout)
  File "/home/ubuntu/.venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 242, in _store_based_barrier
    worker_count = store.add(store_key, 0)
RuntimeError: Connection reset by peer
Traceback (most recent call last):
Traceback (most recent call last):
  File "/usr/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/ubuntu/.venv/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/.venv/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/.venv/lib/python3.9/site-packages/parallelformers/parallel/process.py", line 251, in run
    engine = ParallelEngine(
  File "/home/ubuntu/.venv/lib/python3.9/site-packages/parallelformers/parallel/process.py", line 251, in run
    engine = ParallelEngine(
  File "/home/ubuntu/.venv/lib/python3.9/site-packages/parallelformers/parallel/engine.py", line 53, in __init__
    self.mp_group = self.create_process_group(backend)
  File "/home/ubuntu/.venv/lib/python3.9/site-packages/parallelformers/parallel/engine.py", line 53, in __init__
    self.mp_group = self.create_process_group(backend)
  File "/home/ubuntu/.venv/lib/python3.9/site-packages/parallelformers/parallel/engine.py", line 104, in create_process_group
    dist.init_process_group(backend=backend)
  File "/home/ubuntu/.venv/lib/python3.9/site-packages/parallelformers/parallel/engine.py", line 104, in create_process_group
    dist.init_process_group(backend=backend)
  File "/home/ubuntu/.venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 627, in init_process_group
    _store_based_barrier(rank, store, timeout)
  File "/home/ubuntu/.venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 627, in init_process_group
    _store_based_barrier(rank, store, timeout)
  File "/home/ubuntu/.venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 242, in _store_based_barrier
    worker_count = store.add(store_key, 0)
  File "/home/ubuntu/.venv/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 242, in _store_based_barrier
    worker_count = store.add(store_key, 0)
RuntimeError: Connection reset by peer
RuntimeError: Connection reset by peer

Now if I do

# import torch.multiprocessing as mp
# import multiprocess as mp
import multiprocessing as mp

and I put the __main__ guard it works fine.

Suggestion: Remove mutliprocess as mp for multiprocessing as mp and in the instructions inform users to put if __name__ == '__main__': in the parallelize command.

Top Results From Across the Web

Cannot re-initialize CUDA in forked subprocess #40403 - GitHub

When i ran the inference in a GPU, getting the error Cannot re-initialize CUDA in forked subprocess . When i try to set...

Cannot re-initialize CUDA in forked subprocess - Stack Overflow

I load the model in the parent process and it's accessible to each forked worker process. The problem occurs when creating a CUDA-backed...

RuntimeError: Cannot re-initialize CUDA in forked subprocess ...

RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method.

Cannot re-initialize CUDA in forked subprocess" Displayed in ...

When PyTorch is used to start multiple processes, the following error message is displayed: RuntimeError: Cannot re-initialize CUDA in forked subprocess ...

Cannot re-initialize CUDA in forked subprocess和CUDA error ...

RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method.