arguments are located on different GPUs
See original GitHub issueHi BangLiu, thanks for this awesome code 😃
I want to run QANet with 4 gpus and I change some related settings as:
parser.add_argument('--with_cuda', default=True)
parser.add_argument('--multi_gpu', default=True)
model = nn.DataParallel(model, device_ids=[0, 1, 2, 3])
I am using torch 0.4.0
and cuda 9.0
. And I meet with a runtime error arguments are located on different GPUs
as follow:
Traceback (most recent call last):
File "QANet_main.py", line 633, in <module>
p1, p2 = trainer.model(context_wids, context_cids, question_wids, question_cids)
File "/home/fuyuwei/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "/home/fuyuwei/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 114, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/fuyuwei/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 124, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/fuyuwei/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 65, in parallel_apply
raise output
File "/home/fuyuwei/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 41, in _worker
output = module(*input, **kwargs)
File "/home/fuyuwei/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "/home/fuyuwei/CODE/_QANet/model/QANet_andy.py", line 353, in forward
Ce = self.emb_enc(C, maskC, 1, 1)
File "/home/fuyuwei/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "/home/fuyuwei/CODE/_QANet/model/QANet_andy.py", line 224, in forward
out = PosEncoder(x)
File "/home/fuyuwei/CODE/_QANet/model/QANet_andy.py", line 55, in PosEncoder
return (x + signal.to(device)).transpose(1, 2)
RuntimeError: arguments are located on different GPUs at /pytorch/aten/src/THC/generated/../generic/THCTensorMathPointwise.cu:233
It seems that the error is due to in PosEncoder()
where the input x
and signal.to(device)
is on different gpus.
Could you please provide some clue to solve this problem? Many thanks.
Issue Analytics
- State:
- Created 5 years ago
- Comments:7 (1 by maintainers)
Top Results From Across the Web
DataParallel: Arguments are located on different GPUs
Hi, all, I met following error when using parallel_model = nn.DataParallel(model). Running on one GPU is fine. Traceback (most recent call ...
Read more >RuntimeError: arguments are located on different GPUs #2084
Hi I ran into the following issue when training an RNN-T model with 8 GPUs. Both the encoder and decoder are stacked LSTMs....
Read more >RuntimeError: arguments are located on different GPUs
I'm using AWS EC2 with 8 GPUs, so I've tried the same code with a machine with 1 GPU only and the code...
Read more >Arguments are located on different GPUs when using nn ...
python - Arguments are located on different GPUs when using nn. DataParallel(model) - Stack Overflow. Stack Overflow for Teams – Start ...
Read more >Using gpus Efficiently for ML - CV-Tricks.com
When we run the above snippet, it crashes saying “arguments are located on different GPUs”. So lets make the changes to rectify the...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I think you can use
nvidia-smi
to check your gpu usage, in case some other program is using the gpu.and try batch_size = 4 or 8 to see if it works.
@rsanjaykamath maybe you should decrease the batch size