Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

arguments are located on different GPUs

See original GitHub issue

Hi BangLiu, thanks for this awesome code 😃

I want to run QANet with 4 gpus and I change some related settings as:

parser.add_argument('--with_cuda', default=True)
parser.add_argument('--multi_gpu', default=True)

model = nn.DataParallel(model, device_ids=[0, 1, 2, 3])

I am using torch 0.4.0 and cuda 9.0. And I meet with a runtime error arguments are located on different GPUs as follow:

Traceback (most recent call last):
  File "QANet_main.py", line 633, in <module>
    p1, p2 = trainer.model(context_wids, context_cids, question_wids, question_cids)
  File "/home/fuyuwei/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/fuyuwei/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 114, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/home/fuyuwei/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 124, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/home/fuyuwei/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 65, in parallel_apply
    raise output
  File "/home/fuyuwei/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 41, in _worker
    output = module(*input, **kwargs)
  File "/home/fuyuwei/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/fuyuwei/CODE/_QANet/model/QANet_andy.py", line 353, in forward
    Ce = self.emb_enc(C, maskC, 1, 1)
  File "/home/fuyuwei/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/fuyuwei/CODE/_QANet/model/QANet_andy.py", line 224, in forward
    out = PosEncoder(x)
  File "/home/fuyuwei/CODE/_QANet/model/QANet_andy.py", line 55, in PosEncoder
    return (x + signal.to(device)).transpose(1, 2)
RuntimeError: arguments are located on different GPUs at /pytorch/aten/src/THC/generated/../generic/THCTensorMathPointwise.cu:233

It seems that the error is due to in PosEncoder() where the input x and signal.to(device) is on different gpus.

Could you please provide some clue to solve this problem? Many thanks.