question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

arguments are located on different GPUs

See original GitHub issue

Hi BangLiu, thanks for this awesome code 😃

I want to run QANet with 4 gpus and I change some related settings as:

parser.add_argument('--with_cuda', default=True)
parser.add_argument('--multi_gpu', default=True)

model = nn.DataParallel(model, device_ids=[0, 1, 2, 3])

I am using torch 0.4.0 and cuda 9.0. And I meet with a runtime error arguments are located on different GPUs as follow:

Traceback (most recent call last):
  File "QANet_main.py", line 633, in <module>
    p1, p2 = trainer.model(context_wids, context_cids, question_wids, question_cids)
  File "/home/fuyuwei/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/fuyuwei/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 114, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/home/fuyuwei/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 124, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/home/fuyuwei/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 65, in parallel_apply
    raise output
  File "/home/fuyuwei/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 41, in _worker
    output = module(*input, **kwargs)
  File "/home/fuyuwei/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/fuyuwei/CODE/_QANet/model/QANet_andy.py", line 353, in forward
    Ce = self.emb_enc(C, maskC, 1, 1)
  File "/home/fuyuwei/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/fuyuwei/CODE/_QANet/model/QANet_andy.py", line 224, in forward
    out = PosEncoder(x)
  File "/home/fuyuwei/CODE/_QANet/model/QANet_andy.py", line 55, in PosEncoder
    return (x + signal.to(device)).transpose(1, 2)
RuntimeError: arguments are located on different GPUs at /pytorch/aten/src/THC/generated/../generic/THCTensorMathPointwise.cu:233

It seems that the error is due to in PosEncoder() where the input x and signal.to(device) is on different gpus.

Could you please provide some clue to solve this problem? Many thanks.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:7 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
fuywcommented, Aug 29, 2018

I think you can use nvidia-smi to check your gpu usage, in case some other program is using the gpu.

and try batch_size = 4 or 8 to see if it works.

1reaction
fuywcommented, Aug 28, 2018

@rsanjaykamath maybe you should decrease the batch size

Read more comments on GitHub >

github_iconTop Results From Across the Web

DataParallel: Arguments are located on different GPUs
Hi, all, I met following error when using parallel_model = nn.DataParallel(model). Running on one GPU is fine. Traceback (most recent call ...
Read more >
RuntimeError: arguments are located on different GPUs #2084
Hi I ran into the following issue when training an RNN-T model with 8 GPUs. Both the encoder and decoder are stacked LSTMs....
Read more >
RuntimeError: arguments are located on different GPUs
I'm using AWS EC2 with 8 GPUs, so I've tried the same code with a machine with 1 GPU only and the code...
Read more >
Arguments are located on different GPUs when using nn ...
python - Arguments are located on different GPUs when using nn. DataParallel(model) - Stack Overflow. Stack Overflow for Teams – Start ...
Read more >
Using gpus Efficiently for ML - CV-Tricks.com
When we run the above snippet, it crashes saying “arguments are located on different GPUs”. So lets make the changes to rectify the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found