question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Onnx converted model has its output shape modified when compared to original (finetuned) model

See original GitHub issue

🐛 Bug

Information

Model I am using (Bert, XLNet …): mrm8488/distilroberta-base-finetuned-sentiment from the hub

Language I am using the model on (English, Chinese …): English

The problem arises when using:

  • the official example scripts: (give details below)
  • my own modified scripts: (give details below)

I use the 04-onnx-export.ipynb Notebook and have only change the model name and the tokenizer: image

The issue appeared on all finetuned model I tried, being classification or multichoice questions.

The tasks I am working on is:

  • my own task or dataset: (give details below)
  • an official GLUE/SQUaD task: classification

To reproduce

Steps to reproduce the behavior:

Import AutoTokenizer, AutoModelForSequenceClassification and change tokenizer and model name, the section we are interested into:

# ...
!rm -rf onnx/
from transformers.convert_graph_to_onnx import convert

# Handles all the above steps for you
convert(framework="pt", model="mrm8488/distilroberta-base-finetuned-sentiment", output="onnx/bert-base-cased.onnx", opset=11)
# ...

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("mrm8488/distilroberta-base-finetuned-sentiment")
cpu_model = create_model_for_provider("onnx/bert-base-cased.onnx", "CPUExecutionProvider")

# Inputs are provided through numpy array
model_inputs = tokenizer.encode_plus("My name is Bert", return_tensors="pt")
inputs_onnx = {k: v.cpu().detach().numpy() for k, v in model_inputs.items()}

# Run the model (None = get all the outputs)
sequence, pooled = cpu_model.run(None, inputs_onnx)

# Print information about outputs

print(f"Sequence output: {sequence.shape}, Pooled output: {pooled.shape}")

pytorch_model = AutoModelForSequenceClassification.from_pretrained("mrm8488/distilroberta-base-finetuned-sentiment")
a, = pytorch_model(**model_inputs)
print(f"finetune non onnx pytorch model output: {a.shape}")
# ...

Expected behavior

I was expecting that the onnx output shape would be the same than the non converted model output shape, but that’s not the case:

Sequence output: (1, 6, 768), Pooled output: (1, 768)
finetune non onnx pytorch model output: torch.Size([1, 6])

It is like the last layer of the model related to the classification task is not taken in onnx.
Does it make sense? @mfuntowicz

Environment info

Google Colab with a GPU

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:4
  • Comments:17 (8 by maintainers)

github_iconTop GitHub Comments

4reactions
manueltonneaucommented, Jun 9, 2020

This seems to be related to this issue.

As @hrsmanian points it, it seems that in convert_graph_to_onnx.py, the model is currently converted by default to a ‘feature-extraction’ version where the classification layer is discarded. Changing the pipeline type (line 108 of the py file) to ‘ner’ in @hrsmanian’s case seems to have worked.

In the case of binary classification, I tried changing the pipeline type to ‘sentiment-analysis’ (my model is a binary BertForSequenceClassification) but get a ValueError (ValueError: not enough values to unpack (expected 2, got 1)) when trying to run the session. I used simpletransformers (which is based on this repo) to do binary classification with BERT, followed the instructions for conversion and inference from the blog post.

Let me know if you see what the problem is @mfuntowicz 😃

3reactions
mfuntowiczcommented, Jun 9, 2020

@manueltonneau You’re right, we’re currently enforcing the feature-extraction because not all our pipelines are compatible with ONNX graph representation.

I’ll have a look asap to identify which pipelines are compatible and which are not, so what we can add the possibility to export other kind of pipeline through the script.

Read more comments on GitHub >

github_iconTop Results From Across the Web

pytorch - Why would a Torchscript trace return different looking ...
I'm working with a finetuned Mbart50 model that I need sped up for inferencing because using the HuggingFace model as-is is fairly slow...
Read more >
Exporting Transformers Models - Hugging Face
When a model is exported to the ONNX format, these operators are used to construct a computational graph (often called an intermediate representation)...
Read more >
(optional) Exporting a Model from PyTorch to ONNX and ...
In this tutorial, we describe how to convert a model defined in PyTorch into the ONNX format and then run it with ONNX...
Read more >
Quantization — PyTorch master documentation
PyTorch supports INT8 quantization compared to typical FP32 models allowing for a 4x reduction in the model size and a 4x reduction in...
Read more >
Journey putting YOLO v7 model into TensorFlow Lite (Object ...
This article is not a tutorial on how to convert a PyTorch model into Tensorflow ... The different in input / output between...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found