multi gpu error
See original GitHub issueCUDA_VISIBLE_DEVICES=6,7 python train.py --PCB --batchsize 60 --name PCB-64 --train_all
I use multi gpu, so add some code:
if torch.cuda.device_count() > 1 and use_gpu:
model_wraped = nn.DataParallel(model).cuda()
model = model_wraped
but error in forward:
RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /opt/conda/conda-bld/pytorch_1513368888240/work/torch/lib/THC/THCTensorCopy.cu:204
Issue Analytics
- State:
- Created 5 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
Getting error in a multi gpu machine #2701
My model is working fine when I use gpu:0 but it is giving error when I use gpu:1. ... Getting error in a...
Read more >Error occurs when saving model in multi-gpu settings
I'm finetuning a language model on multiple gpus. However, I met some problems with saving the model. After saving the model using ....
Read more >Multi-GPU training crashes after some time due to NVLink ...
I want to train my model on a dual GPU set-up using Trainer(gpus=2, strategy='ddp'). To my understanding, Lightning sets up Distributed ...
Read more >Multi-gpu runtime error - fastai dev - fast.ai Course Forums
I have been experimenting with fastai multi-gpu training. Spun up a multi-gpu instance on Jarvislabs.ai with fastai 2.5.0 having 2 RTX5000 ...
Read more >Multiple streams on 1 GPU and out of memory error
I'm running multiple streams across multiple GPUs. I have more streams (dozens) than devices (4). I assign the streams to the devices round ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
This is my code. You may refer it and modify your code.
Thanks for your reply, and it solved my problem!