question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

sentence bert model in onnx format

See original GitHub issue

I would like to convert sentence bert model from pytorch to tensorflow use onnx, and tried to follow the standard onnx procedure for converting a pytorch model. But I’m having difficulty determining the onnx input arguments for sentence bert model, I encounter TypeError: forward() takes 2 positional arguments but 4 were given. Suggestions appreciated! model = SentenceTransformer('output/continue_training_model') device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model.to(device) dummy_input0 = torch.LongTensor(batch_size, max_seq_length).to(device) dummy_input1 = torch.LongTensor(batch_size, max_seq_length).to(device) dummy_input2 = torch.LongTensor(batch_size, max_seq_length).to(device) torch.onnx.export(model,(dummy_input0, dummy_input1,dummy_input2), onnx_file_name, verbose=True)

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:25 (5 by maintainers)

github_iconTop GitHub Comments

4reactions
raphaelstycommented, Jun 11, 2022

Hi,

I had some trouble converting the sentence-transformers/all-mpnet-base-v2 model with onnx format so I’ll share with you a class and a function that I have made with @yuanzhoulvpi2017 tutorial (it was helpful, thank you).

I’ve done some tests and I tend to measure a 4x speedup using onnx format. I’m not sure my code is fully optimised.

import torch
import transformers
from sentence_transformers import SentenceTransformer, models


class OnnxEncoder:
    """OnxEncoder dedicated to run SentenceTransformer under OnnxRuntime."""

    def __init__(self, session, tokenizer, pooling, normalization):
        self.session = session
        self.tokenizer = tokenizer
        self.max_length = tokenizer.__dict__["model_max_length"]
        self.pooling = pooling
        self.normalization = normalization

    def encode(self, sentences: list):

        sentences = [sentences] if isinstance(sentences, str) else sentences

        inputs = {
            k: v.numpy()
            for k, v in self.tokenizer(
                sentences,
                padding=True,
                truncation=True,
                max_length=self.max_length,
                return_tensors="pt",
            ).items()
        }

        hidden_state = self.session.run(None, inputs)
        sentence_embedding = self.pooling.forward(
            features={
                "token_embeddings": torch.Tensor(hidden_state[0]),
                "attention_mask": torch.Tensor(inputs.get("attention_mask")),
            },
        )

        if self.normalization is not None:
            sentence_embedding = self.normalization.forward(features=sentence_embedding)

        sentence_embedding = sentence_embedding["sentence_embedding"]

        if sentence_embedding.shape[0] == 1:
            sentence_embedding = sentence_embedding[0]

        return sentence_embedding.numpy()


def sentence_transformers_onnx(
    model,
    path,
    do_lower_case=True,
    input_names=["input_ids", "attention_mask", "segment_ids"],
    providers=["CPUExecutionProvider"],
):
    """OnxRuntime for sentence transformers.

    Parameters
    ----------
    model
        SentenceTransformer model.
    path
        Model file dedicated to session inference.
    do_lower_case
        Either or not the model is cased.
    input_names
        Fields needed by the Transformer.
    providers
        Either run the model on CPU or GPU: ["CPUExecutionProvider", "CUDAExecutionProvider"].

    """
    try:
        import onnxruntime
    except:
        raise ValueError("You need to install onnxruntime.")

    model.save(path)

    configuration = transformers.AutoConfig.from_pretrained(
        path, from_tf=False, local_files_only=True
    )

    tokenizer = transformers.AutoTokenizer.from_pretrained(
        path, do_lower_case=do_lower_case, from_tf=False, local_files_only=True
    )

    encoder = transformers.AutoModel.from_pretrained(
        path, from_tf=False, config=configuration, local_files_only=True
    )

    st = ["cherche"]

    inputs = tokenizer(
        st,
        padding=True,
        truncation=True,
        max_length=tokenizer.__dict__["model_max_length"],
        return_tensors="pt",
    )

    model.eval()

    with torch.no_grad():

        symbolic_names = {0: "batch_size", 1: "max_seq_len"}

        torch.onnx.export(
            encoder,
            args=tuple(inputs.values()),
            f=f"{path}.onx",
            opset_version=13,  # ONX version needs to be >= 13 for sentence transformers.
            do_constant_folding=True,
            input_names=input_names,
            output_names=["start", "end"],
            dynamic_axes={
                "input_ids": symbolic_names,
                "attention_mask": symbolic_names,
                "segment_ids": symbolic_names,
                "start": symbolic_names,
                "end": symbolic_names,
            },
        )

        normalization = None
        for modules in model.modules():
            for idx, module in enumerate(modules):
                if idx == 1:
                    pooling = module
                if idx == 2:
                    normalization = module
            break

        return OnnxEncoder(
            session=onnxruntime.InferenceSession(f"{path}.onx", providers=providers),
            tokenizer=tokenizer,
            pooling=pooling,
            normalization=normalization,
        )

The sentence_transformers_onx function returns a model with a method encode that behave like SentenceTransformers models.

model = sentence_transformers_onnx(
    model = SentenceTransformer("sentence-transformers/all-mpnet-base-v2"),
    path = "onnx_model",
)

Raphaël

4reactions
nreimerscommented, Jul 10, 2020

Hi @ycgui I started to add the models to HuggingFace Models Hub: https://huggingface.co/sentence-transformers

Huggingface also provides methods / scripts to convert models to ONNX.

I hope this helps.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Export to ONNX - Transformers - Hugging Face
This exports an ONNX graph of the checkpoint defined by the --model argument. In this example, it is distilbert-base-uncased , but it can...
Read more >
Accelerate Sentence Transformers with Hugging Face Optimum
The session will show you how to dynamically quantize and optimize a MiniLM Sentence Transformers model using Hugging Face Optimum and ONNX ......
Read more >
Export Fine-tuned Bert model to ONNX and inference using ...
In this post, a fine-tuned XLM-Roberta Bert model will be exported as onnx format and the exported onnx model will be inferred on...
Read more >
Inference with C# BERT NLP and ONNX Runtime | onnxruntime
Hugging Face has a great API for downloading open source models and then we can use python and Pytorch to export them to...
Read more >
NLP Transformers pipelines with ONNX - Towards Data Science
For this example, we can use any TokenClassification model from Hugging Face's library because the task we are trying to solve is NER...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found