question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Getting error in a multi gpu machine

See original GitHub issue

Description A clear and concise description of what the bug is. My model is working fine when I use gpu:0 but it is giving error when I use gpu:1.

I got this error:

Traceback (most recent call last):
  File "zst_client.py", line 53, in <module>
    run_inference('Jupiter’s Biggest Moons Started as Tiny Grains of Hail')
  File "zst_client.py", line 39, in run_inference
    response = triton_client.infer(model_name,         model_version=model_version, inputs=[input0, input1], outputs=[output])
  File "/usr/local/lib/python3.8/dist-packages/tritonclient/http/__init__.py", line 1102, in infer
    _raise_if_error(response)
  File "/usr/local/lib/python3.8/dist-packages/tritonclient/http/__init__.py", line 63, in _raise_if_error
    raise error
tritonclient.utils.InferenceServerException: PyTorch execute failure: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__.py", line 12, in forward
    input_ids = torch.to(data, dtype=4, layout=0, device=torch.device("cuda"), pin_memory=False, non_blocking=False, copy=False, memory_format=None)
    attention_mask0 = torch.to(attention_mask, dtype=4, layout=0, device=torch.device("cuda"), pin_memory=False, non_blocking=False, copy=False, memory_format=None)
    _1 = (_0).forward(input_ids, attention_mask0, )
          ~~~~~~~~~~~ <--- HERE
    return (_1,)
  File "code/__torch__/transformers/modeling_xlm_roberta.py", line 11, in forward
    attention_mask: Tensor) -> Tensor:
    _0 = self.classifier
    _1 = (self.roberta).forward(input_ids, attention_mask, )
          ~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    return (_0).forward(_1, )
  File "code/__torch__/transformers/modeling_roberta.py", line 21, in forward
    _7 = torch.to(extended_attention_mask, 6, False, False, None)
    attention_mask0 = torch.mul(torch.rsub(_7, 1., 1), CONSTANTS.c0)
    _8 = (_0).forward((_1).forward(input_ids, input, ), attention_mask0, )
                       ~~~~~~~~~~~ <--- HERE
    return _8
class RobertaEmbeddings(Module):
  File "code/__torch__/transformers/modeling_roberta.py", line 47, in forward
    _16 = torch.add(_15, CONSTANTS.c1, alpha=1)
    input0 = torch.to(_16, dtype=4, layout=0, device=torch.device("cuda:0"), pin_memory=False, non_blocking=False, copy=False, memory_format=None)
    _17 = (_13).forward(input_ids, )
           ~~~~~~~~~~~~ <--- HERE
    _18 = (_12).forward(input0, )
    _19 = (_11).forward(input, )
  File "code/__torch__/torch/nn/modules/sparse.py", line 8, in forward
  def forward(self: __torch__.torch.nn.modules.sparse.Embedding,
    input_ids: Tensor) -> Tensor:
    inputs_embeds = torch.embedding(self.weight, input_ids, 1, False, False)
                    ~~~~~~~~~~~~~~~ <--- HERE
    return inputs_embeds

Traceback of TorchScript, original code (most recent call last):
/home/cmeena/.local/lib/python3.8/site-packages/torch/nn/functional.py(1814): embedding
/home/cmeena/.local/lib/python3.8/site-packages/torch/nn/modules/sparse.py(124): forward
/home/cmeena/.local/lib/python3.8/site-packages/torch/nn/modules/module.py(704): _slow_forward
/home/cmeena/.local/lib/python3.8/site-packages/torch/nn/modules/module.py(720): _call_impl
/home/cmeena/.local/lib/python3.8/site-packages/transformers/modeling_roberta.py(117): forward
/home/cmeena/.local/lib/python3.8/site-packages/torch/nn/modules/module.py(704): _slow_forward
/home/cmeena/.local/lib/python3.8/site-packages/torch/nn/modules/module.py(720): _call_impl
/home/cmeena/.local/lib/python3.8/site-packages/transformers/modeling_roberta.py(674): forward
/home/cmeena/.local/lib/python3.8/site-packages/torch/nn/modules/module.py(704): _slow_forward
/home/cmeena/.local/lib/python3.8/site-packages/torch/nn/modules/module.py(720): _call_impl
/home/cmeena/.local/lib/python3.8/site-packages/transformers/modeling_roberta.py(989): forward
/home/cmeena/.local/lib/python3.8/site-packages/torch/nn/modules/module.py(704): _slow_forward
/home/cmeena/.local/lib/python3.8/site-packages/torch/nn/modules/module.py(720): _call_impl
robertamodelgpu.py(17): forward
/home/cmeena/.local/lib/python3.8/site-packages/torch/nn/modules/module.py(704): _slow_forward
/home/cmeena/.local/lib/python3.8/site-packages/torch/nn/modules/module.py(720): _call_impl
/home/cmeena/.local/lib/python3.8/site-packages/torch/jit/__init__.py(1109): trace_module
/home/cmeena/.local/lib/python3.8/site-packages/torch/jit/__init__.py(953): trace
robertamodelgpu.py(20): <module>
RuntimeError: Input, output and indices must be on the current device

Triton Information What version of Triton are you using? 21.02

Are you using the Triton container or did you build it yourself? I am using Triton container.

To Reproduce Steps to reproduce the behavior. I have used model from this blog for creating this experiment: https://medium.com/nvidia-ai/how-to-deploy-almost-any-hugging-face-model-on-nvidia-triton-inference-server-with-an-8ee7ec0e6fc4 Use this model:

import torch
from transformers import XLMRobertaForSequenceClassification, XLMRobertaTokenizer

R_tokenizer = XLMRobertaTokenizer.from_pretrained('joeddav/xlm-roberta-large-xnli')
premise = 'Jupiters Biggest Moons Started as Tiny Grains of Hail'
hypothesis = 'This text is about space and cosomos'

input_ids = R_tokenizer.encode(premise, hypothesis, return_tensors='pt', max_length=256, truncation=True, padding='max_length')
mask = input_ids != -1
mask = mask.long()

class PyTorch_to_TorchScript(torch.nn.Module):
    def __init__(self):
        super(PyTorch_to_TorchScript, self).__init__()
        self.model = XLMRobertaForSequenceClassification.from_pretrained('joeddav/xlm-roberta-large-xnli').cuda()
    def forward(self, data, attention_mask=None):
        return tuple(self.model(data.cuda(), attention_mask.cuda()))

pt_model = PyTorch_to_TorchScript().eval()
traced_script_module = torch.jit.trace(pt_model, (input_ids, mask), strict=False)
traced_script_module.save("mode-tuplel.pt")

Try these two config:

name: "zst"
platform: "pytorch_libtorch"
input [
 {
    name: "input__0"
    data_type: TYPE_INT32
    dims: [ -1,-1 ]
  } ,
{
    name: "input__1"
    data_type: TYPE_INT32
    dims: [ -1,-1 ]
  }
]
output {
    name: "output__0"
    data_type: TYPE_FP32
    dims: [-1, -1]
  }
  instance_group [
    {
      count: 1
      kind: KIND_GPU
      gpus: [ 0 ]
    }
  ]

Use this config:

name: "zst"
platform: "pytorch_libtorch"
input [
 {
    name: "input__0"
    data_type: TYPE_INT32
    dims: [ -1,-1 ]
  } ,
{
    name: "input__1"
    data_type: TYPE_INT32
    dims: [ -1,-1 ]
  }
]
output {
    name: "output__0"
    data_type: TYPE_FP32
    dims: [-1, -1]
  }
  instance_group [
    {
      count: 1
      kind: KIND_GPU
      gpus: [ 1 ]
    }
  ]

Use machine which has 2 or more gpus.

You will find the for gpu:0 it works fine but it will give error for gpu:1.

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

Expected behavior I want it to work in instance of multiple gpu devices.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
CoderHamcommented, Apr 20, 2021

@chandrameenamohan closing due to inactivity.

0reactions
yjegsslcommented, May 13, 2021

same error

Read more comments on GitHub >

github_iconTop Results From Across the Web

Error occurs when saving model in multi-gpu settings
I'm finetuning a language model on multiple gpus. However, I met some problems with saving the model. After saving the model using ....
Read more >
Problems with multi-gpus - MATLAB Answers - MathWorks
Learn more about multi gpus. ... no problem training with a single gpu, but when I try to train with multiple gpus, matlab...
Read more >
Getting error in multi-gpu training with pytorch lightning
The below code works on a single GPU but throws an error while using multiple gpus RuntimeError: grad can be implicitly created only...
Read more >
Graphics Processing Unit (GPU) — PyTorch Lightning 1.6.2 ...
Make sure you're running on a machine with at least one GPU. There's no need to specify any NVIDIA flags as Lightning will...
Read more >
Multi GPU Model Training: Monitoring and Optimizing
The backward pass passes the errors through the layers of the network ... We can train a model on a single machine having...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found