Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Not able to load the BertForSequenceClassification model from huggingface

See original GitHub issue

Description Hi, i am not able to load the BertForSequenceClassification model from huggingface in the triton inference server

Triton Information v21.09

Are you using the Triton container or did you build it yourself? I am using the triton container nvcr.io/nvidia/tritonserver:21.09-py3

To Reproduce I used the following script to convert the bert model to traced model. And i wanted to load this model in the triton inference server.

import torch
from transformers import BertForSequenceClassification, BertTokenizer

input_ids = torch.tensor([[2,3,4,5]]).long()
mask = input_ids != 1
mask = mask.long()

class MyModel(torch.nn.Module):
    def __init__(self):
       super().__init__()
       self.model =  BertForSequenceClassification.from_pretrained("bert-base-uncased")

    def forward(self,data, attention_mask):
      out2 = self.model(data, attention_mask=attention_mask) 
      return out2[0]



pt_model = MyModel().eval()
print(pt_model(input_ids, mask))
traced_model = 	torch.jit.trace(pt_model,(input_ids,mask))
traced_model.save("model.pt")

the config.pbtxt file is

name: "bert"
platform: "pytorch_libtorch"
input [
 {
    name: "input__0"
    data_type: TYPE_INT32
    dims: [1, 512]
  },
  input [
 {
    name: "input__1"
    data_type: TYPE_INT32
    dims: [1, 512]
  }
]
output {
    name: "output__0"
    data_type: TYPE_FP32
    dims: [1, 2]
  }

And now runned the command sudo docker run --rm -p8000:8000 -p8001:8001 -p8002:8002 -v /home/rkoy/server/model_repository:/models nvcr.io/nvidia/tritonserver:21.09-py3 tritonserver --model-repository=/models

I got this error
UNAVAILABLE: Internal: failed to load model 'bert':                                                                       |                                                             
|       |         | Arguments for call are not valid.                                                                                        |                                                             
|       |         | The following variants are available:                                                                                    |                                                             
|       |         |                                                                                                                          |                                                             
|       |         |   aten::gelu(Tensor self, bool approximate) -> (Tensor):                                                                 |                                                             
|       |         |   Argument approximate not provided.                                                                                     |                                                             
|       |         |                                                                                                                          |                                                             
|       |         |   aten::gelu.out(Tensor self, bool approximate, *, Tensor(a!) out) -> (Tensor(a!)):                                      |                                                             
|       |         |   Argument approximate not provided.                                                                                     |                                                             
|       |         |                                                                                                                          |                                                             
|       |         | The original call is:                                                                                                    |
|       |         | /data/rkoy/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py(1313): gelu                                 |                                                             
|       |         | /data/rkoy/anaconda3/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py(425): forward            |
|       |         | /data/rkoy/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py(534): _slow_forward                     |
|       |         | /data/rkoy/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py(548): __call__                          |                                                             
|       |         | /data/rkoy/anaconda3/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py(523): feed_forward_chunk |                                                             
|       |         | /data/rkoy/anaconda3/lib/python3.8/site-packages/transformers/modeling_utils.py(2349): apply_chunking_to_forward    |
|       |         | /data/rkoy/anaconda3/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py(511): forward            |                                                             
|       |         | /data/rkoy/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py(534): _slow_forward                     |
|       |         | /data/rkoy/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py(548): __call__                          |                                                             
|       |         | /data/rkoy/anaconda3/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py(583): forward            |                                                             
|       |         | /data/rkoy/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py(534): _slow_forward                     |                                                             
|       |         | /data/rkoy/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py(548): __call__                          |                                                             
|       |         | /data/rkoy/anaconda3/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py(996): forward            |                                                             |       |         | /data/rkoy/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py(534): _slow_forward                     |
|       |         | /data/rkoy/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py(548): __call__                          |
|       |         | /data/rkoy/anaconda3/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py(1530): forward           |
|       |         | /data/rkoy/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py(534): _slow_forward                     |
|       |         | /data/rkoy/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py(548): __call__                          |
|       |         | convert_pytorch_model_to_jit.py(17): forward                                                                             |
|       |         | /data/rkoy/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py(534): _slow_forward                     |
|       |         | /data/rkoy/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py(548): __call__                          |
|       |         | /data/rkoy/anaconda3/lib/python3.8/site-packages/torch/jit/__init__.py(1027): trace_module                          |
|       |         | /data/rkoy/anaconda3/lib/python3.8/site-packages/torch/jit/__init__.py(873): trace                                  |
|       |         | convert_pytorch_model_to_jit.py(25): <module>                                                                            |
|       |         | Serialized   File "code/__torch__/transformers/models/bert/modeling_bert.py", line 178                                   |
|       |         |   def forward(self: __torch__.transformers.models.bert.modeling_bert.BertIntermediate,                                   |
|       |         |     argument_1: Tensor) -> Tensor:                                                                                       |
|       |         |     input = torch.gelu((self.dense).forward(argument_1, ))                                                               |
|       |         |             ~~~~~~~~~~ <--- HERE                                                                                         |
|       |         |     return input                                                                                                         |
|       |         | class BertOutput(Module):

Expected behavior I expected that the bert model would have been loaded successfully without any error

Would like to know to mistake i am making and also a solution for it.

Issue Analytics

State:
Created 2 years ago
Comments:13 (5 by maintainers)

Top GitHub Comments

1reaction

dyastremskycommented, Feb 9, 2022

Closing issue due to lack of activity. Please re-open the issue if you would like to follow up with this issue.

1reaction

EvGe22commented, Nov 27, 2021

Hi. Having the same problem running nvcr.io/nvidia/tritonserver:21.11-py3 and transformers==2.10 (I know it’s kinda old)

Have managed to “solve” this by removing the if in the transformers code here and just hard coding the gelu = _gelu_python

This allowed me to run the model. It even seems to run fine. There are some small differences between the predictions of the PyTorch model and the the TritonServer one, but they are negligible in my case. The speed seems almost the same. In my case the Triton server is even sometimes slower.

But I have only been testing with the dynamic batching option turned off and I already had some internal code that turns text into torch.Tensor. So, in order to use the Triton model I have to convert every Tensor into numpy array first. + the GRPC delay.

In my case, a PyTorch version takes ~1.61 seconds to run 1973 items (batches of size 100), Triton takes ~1.64 seconds. However, when we split the data into randomly sized (1-100) batches the times differ more: 2.384 PyTorch vs 2.75 Triron The random seed used to split the batches was fixed, and I have run 10 experiments, these are the mean values. The std is quite low - 0.02 seconds in both cases.

Top Results From Across the Web

Is there a way to correctly load a pre-trained transformers ...

I would like to fine-tune a pre-trained transformers model on Question Answering. The model was pre-trained on large engineering & science ...

Models - Hugging Face

load_tf_weights ( Callable ) — A python method for loading a TensorFlow checkpoint in a PyTorch model, taking as arguments: model (PreTrainedModel) —...

How to load a torch model with transformers?

I want to use this model remotely and I uploaded it to hugging face hub, but when uploading it with “AutoModelForSequenceClassification” I get ......

Troubleshoot - Hugging Face

Unable to load a saved TensorFlow model. TensorFlow's model.save method will save the entire model - architecture, weights, training configuration - in a...

How do I change the classification head of a model?

Hello, I like to change the number of labels that a trained model has. I am loading a model that was trained on...