Getting error in a multi gpu machine
See original GitHub issueDescription A clear and concise description of what the bug is. My model is working fine when I use gpu:0 but it is giving error when I use gpu:1.
I got this error:
Traceback (most recent call last):
File "zst_client.py", line 53, in <module>
run_inference('Jupiter’s Biggest Moons Started as Tiny Grains of Hail')
File "zst_client.py", line 39, in run_inference
response = triton_client.infer(model_name, model_version=model_version, inputs=[input0, input1], outputs=[output])
File "/usr/local/lib/python3.8/dist-packages/tritonclient/http/__init__.py", line 1102, in infer
_raise_if_error(response)
File "/usr/local/lib/python3.8/dist-packages/tritonclient/http/__init__.py", line 63, in _raise_if_error
raise error
tritonclient.utils.InferenceServerException: PyTorch execute failure: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/__torch__.py", line 12, in forward
input_ids = torch.to(data, dtype=4, layout=0, device=torch.device("cuda"), pin_memory=False, non_blocking=False, copy=False, memory_format=None)
attention_mask0 = torch.to(attention_mask, dtype=4, layout=0, device=torch.device("cuda"), pin_memory=False, non_blocking=False, copy=False, memory_format=None)
_1 = (_0).forward(input_ids, attention_mask0, )
~~~~~~~~~~~ <--- HERE
return (_1,)
File "code/__torch__/transformers/modeling_xlm_roberta.py", line 11, in forward
attention_mask: Tensor) -> Tensor:
_0 = self.classifier
_1 = (self.roberta).forward(input_ids, attention_mask, )
~~~~~~~~~~~~~~~~~~~~~ <--- HERE
return (_0).forward(_1, )
File "code/__torch__/transformers/modeling_roberta.py", line 21, in forward
_7 = torch.to(extended_attention_mask, 6, False, False, None)
attention_mask0 = torch.mul(torch.rsub(_7, 1., 1), CONSTANTS.c0)
_8 = (_0).forward((_1).forward(input_ids, input, ), attention_mask0, )
~~~~~~~~~~~ <--- HERE
return _8
class RobertaEmbeddings(Module):
File "code/__torch__/transformers/modeling_roberta.py", line 47, in forward
_16 = torch.add(_15, CONSTANTS.c1, alpha=1)
input0 = torch.to(_16, dtype=4, layout=0, device=torch.device("cuda:0"), pin_memory=False, non_blocking=False, copy=False, memory_format=None)
_17 = (_13).forward(input_ids, )
~~~~~~~~~~~~ <--- HERE
_18 = (_12).forward(input0, )
_19 = (_11).forward(input, )
File "code/__torch__/torch/nn/modules/sparse.py", line 8, in forward
def forward(self: __torch__.torch.nn.modules.sparse.Embedding,
input_ids: Tensor) -> Tensor:
inputs_embeds = torch.embedding(self.weight, input_ids, 1, False, False)
~~~~~~~~~~~~~~~ <--- HERE
return inputs_embeds
Traceback of TorchScript, original code (most recent call last):
/home/cmeena/.local/lib/python3.8/site-packages/torch/nn/functional.py(1814): embedding
/home/cmeena/.local/lib/python3.8/site-packages/torch/nn/modules/sparse.py(124): forward
/home/cmeena/.local/lib/python3.8/site-packages/torch/nn/modules/module.py(704): _slow_forward
/home/cmeena/.local/lib/python3.8/site-packages/torch/nn/modules/module.py(720): _call_impl
/home/cmeena/.local/lib/python3.8/site-packages/transformers/modeling_roberta.py(117): forward
/home/cmeena/.local/lib/python3.8/site-packages/torch/nn/modules/module.py(704): _slow_forward
/home/cmeena/.local/lib/python3.8/site-packages/torch/nn/modules/module.py(720): _call_impl
/home/cmeena/.local/lib/python3.8/site-packages/transformers/modeling_roberta.py(674): forward
/home/cmeena/.local/lib/python3.8/site-packages/torch/nn/modules/module.py(704): _slow_forward
/home/cmeena/.local/lib/python3.8/site-packages/torch/nn/modules/module.py(720): _call_impl
/home/cmeena/.local/lib/python3.8/site-packages/transformers/modeling_roberta.py(989): forward
/home/cmeena/.local/lib/python3.8/site-packages/torch/nn/modules/module.py(704): _slow_forward
/home/cmeena/.local/lib/python3.8/site-packages/torch/nn/modules/module.py(720): _call_impl
robertamodelgpu.py(17): forward
/home/cmeena/.local/lib/python3.8/site-packages/torch/nn/modules/module.py(704): _slow_forward
/home/cmeena/.local/lib/python3.8/site-packages/torch/nn/modules/module.py(720): _call_impl
/home/cmeena/.local/lib/python3.8/site-packages/torch/jit/__init__.py(1109): trace_module
/home/cmeena/.local/lib/python3.8/site-packages/torch/jit/__init__.py(953): trace
robertamodelgpu.py(20): <module>
RuntimeError: Input, output and indices must be on the current device
Triton Information What version of Triton are you using? 21.02
Are you using the Triton container or did you build it yourself? I am using Triton container.
To Reproduce Steps to reproduce the behavior. I have used model from this blog for creating this experiment: https://medium.com/nvidia-ai/how-to-deploy-almost-any-hugging-face-model-on-nvidia-triton-inference-server-with-an-8ee7ec0e6fc4 Use this model:
import torch
from transformers import XLMRobertaForSequenceClassification, XLMRobertaTokenizer
R_tokenizer = XLMRobertaTokenizer.from_pretrained('joeddav/xlm-roberta-large-xnli')
premise = 'Jupiters Biggest Moons Started as Tiny Grains of Hail'
hypothesis = 'This text is about space and cosomos'
input_ids = R_tokenizer.encode(premise, hypothesis, return_tensors='pt', max_length=256, truncation=True, padding='max_length')
mask = input_ids != -1
mask = mask.long()
class PyTorch_to_TorchScript(torch.nn.Module):
def __init__(self):
super(PyTorch_to_TorchScript, self).__init__()
self.model = XLMRobertaForSequenceClassification.from_pretrained('joeddav/xlm-roberta-large-xnli').cuda()
def forward(self, data, attention_mask=None):
return tuple(self.model(data.cuda(), attention_mask.cuda()))
pt_model = PyTorch_to_TorchScript().eval()
traced_script_module = torch.jit.trace(pt_model, (input_ids, mask), strict=False)
traced_script_module.save("mode-tuplel.pt")
Try these two config:
name: "zst"
platform: "pytorch_libtorch"
input [
{
name: "input__0"
data_type: TYPE_INT32
dims: [ -1,-1 ]
} ,
{
name: "input__1"
data_type: TYPE_INT32
dims: [ -1,-1 ]
}
]
output {
name: "output__0"
data_type: TYPE_FP32
dims: [-1, -1]
}
instance_group [
{
count: 1
kind: KIND_GPU
gpus: [ 0 ]
}
]
Use this config:
name: "zst"
platform: "pytorch_libtorch"
input [
{
name: "input__0"
data_type: TYPE_INT32
dims: [ -1,-1 ]
} ,
{
name: "input__1"
data_type: TYPE_INT32
dims: [ -1,-1 ]
}
]
output {
name: "output__0"
data_type: TYPE_FP32
dims: [-1, -1]
}
instance_group [
{
count: 1
kind: KIND_GPU
gpus: [ 1 ]
}
]
Use machine which has 2 or more gpus.
You will find the for gpu:0 it works fine but it will give error for gpu:1.
Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).
Expected behavior I want it to work in instance of multiple gpu devices.
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (3 by maintainers)
Top GitHub Comments
@chandrameenamohan closing due to inactivity.
same error