Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

sentence bert model in onnx format

See original GitHub issue

I would like to convert sentence bert model from pytorch to tensorflow use onnx, and tried to follow the standard onnx procedure for converting a pytorch model. But I’m having difficulty determining the onnx input arguments for sentence bert model, I encounter TypeError: forward() takes 2 positional arguments but 4 were given. Suggestions appreciated! model = SentenceTransformer('output/continue_training_model') device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model.to(device) dummy_input0 = torch.LongTensor(batch_size, max_seq_length).to(device) dummy_input1 = torch.LongTensor(batch_size, max_seq_length).to(device) dummy_input2 = torch.LongTensor(batch_size, max_seq_length).to(device) torch.onnx.export(model,(dummy_input0, dummy_input1,dummy_input2), onnx_file_name, verbose=True)

Issue Analytics

State:
Created 4 years ago
Comments:25 (5 by maintainers)

Top GitHub Comments

4reactions

raphaelstycommented, Jun 11, 2022

Hi,

I had some trouble converting the sentence-transformers/all-mpnet-base-v2 model with onnx format so I’ll share with you a class and a function that I have made with @yuanzhoulvpi2017 tutorial (it was helpful, thank you).

I’ve done some tests and I tend to measure a 4x speedup using onnx format. I’m not sure my code is fully optimised.

import torch
import transformers
from sentence_transformers import SentenceTransformer, models


class OnnxEncoder:
    """OnxEncoder dedicated to run SentenceTransformer under OnnxRuntime."""

    def __init__(self, session, tokenizer, pooling, normalization):
        self.session = session
        self.tokenizer = tokenizer
        self.max_length = tokenizer.__dict__["model_max_length"]
        self.pooling = pooling
        self.normalization = normalization

    def encode(self, sentences: list):

        sentences = [sentences] if isinstance(sentences, str) else sentences

        inputs = {
            k: v.numpy()
            for k, v in self.tokenizer(
                sentences,
                padding=True,
                truncation=True,
                max_length=self.max_length,
                return_tensors="pt",
            ).items()
        }

        hidden_state = self.session.run(None, inputs)
        sentence_embedding = self.pooling.forward(
            features={
                "token_embeddings": torch.Tensor(hidden_state[0]),
                "attention_mask": torch.Tensor(inputs.get("attention_mask")),
            },
        )

        if self.normalization is not None:
            sentence_embedding = self.normalization.forward(features=sentence_embedding)

        sentence_embedding = sentence_embedding["sentence_embedding"]

        if sentence_embedding.shape[0] == 1:
            sentence_embedding = sentence_embedding[0]

        return sentence_embedding.numpy()


def sentence_transformers_onnx(
    model,
    path,
    do_lower_case=True,
    input_names=["input_ids", "attention_mask", "segment_ids"],
    providers=["CPUExecutionProvider"],
):
    """OnxRuntime for sentence transformers.

    Parameters
    ----------
    model
        SentenceTransformer model.
    path
        Model file dedicated to session inference.
    do_lower_case
        Either or not the model is cased.
    input_names
        Fields needed by the Transformer.
    providers
        Either run the model on CPU or GPU: ["CPUExecutionProvider", "CUDAExecutionProvider"].

    """
    try:
        import onnxruntime
    except:
        raise ValueError("You need to install onnxruntime.")

    model.save(path)

    configuration = transformers.AutoConfig.from_pretrained(
        path, from_tf=False, local_files_only=True
    )

    tokenizer = transformers.AutoTokenizer.from_pretrained(
        path, do_lower_case=do_lower_case, from_tf=False, local_files_only=True
    )

    encoder = transformers.AutoModel.from_pretrained(
        path, from_tf=False, config=configuration, local_files_only=True
    )

    st = ["cherche"]

    inputs = tokenizer(
        st,
        padding=True,
        truncation=True,
        max_length=tokenizer.__dict__["model_max_length"],
        return_tensors="pt",
    )

    model.eval()

    with torch.no_grad():

        symbolic_names = {0: "batch_size", 1: "max_seq_len"}

        torch.onnx.export(
            encoder,
            args=tuple(inputs.values()),
            f=f"{path}.onx",
            opset_version=13,  # ONX version needs to be >= 13 for sentence transformers.
            do_constant_folding=True,
            input_names=input_names,
            output_names=["start", "end"],
            dynamic_axes={
                "input_ids": symbolic_names,
                "attention_mask": symbolic_names,
                "segment_ids": symbolic_names,
                "start": symbolic_names,
                "end": symbolic_names,
            },
        )

        normalization = None
        for modules in model.modules():
            for idx, module in enumerate(modules):
                if idx == 1:
                    pooling = module
                if idx == 2:
                    normalization = module
            break

        return OnnxEncoder(
            session=onnxruntime.InferenceSession(f"{path}.onx", providers=providers),
            tokenizer=tokenizer,
            pooling=pooling,
            normalization=normalization,
        )

The sentence_transformers_onx function returns a model with a method encode that behave like SentenceTransformers models.

model = sentence_transformers_onnx(
    model = SentenceTransformer("sentence-transformers/all-mpnet-base-v2"),
    path = "onnx_model",
)

Raphaël

4reactions

nreimerscommented, Jul 10, 2020

Hi @ycgui I started to add the models to HuggingFace Models Hub: https://huggingface.co/sentence-transformers

Huggingface also provides methods / scripts to convert models to ONNX.

I hope this helps.

Top Results From Across the Web

Export to ONNX - Transformers - Hugging Face

This exports an ONNX graph of the checkpoint defined by the --model argument. In this example, it is distilbert-base-uncased , but it can...

Accelerate Sentence Transformers with Hugging Face Optimum

The session will show you how to dynamically quantize and optimize a MiniLM Sentence Transformers model using Hugging Face Optimum and ONNX ......

Export Fine-tuned Bert model to ONNX and inference using ...

In this post, a fine-tuned XLM-Roberta Bert model will be exported as onnx format and the exported onnx model will be inferred on...

Inference with C# BERT NLP and ONNX Runtime | onnxruntime

Hugging Face has a great API for downloading open source models and then we can use python and Pytorch to export them to...

NLP Transformers pipelines with ONNX - Towards Data Science

For this example, we can use any TokenClassification model from Hugging Face's library because the task we are trying to solve is NER...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

sentence bert model in onnx format

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

reproducing the paper's best results

Downloading Pre-Trained Model Failed