Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Possibility to speed up inference of onnx models with transformers.pipeline

See original GitHub issue

Problem Description

Based on the model named “Helsinki-NLP/opus-mt-es-en”, I investigated the time-consuming composition of using the onnx model and the pytorch model for inference, and found that the main time difference lies in a small step in beam search: scores = scores.masked_fill(banned_mask, -float(“inf”)) When I use the pytorch model for inference, this line of code only consumes 0.10ms per execution, while using the onnx model consumes close to 10ms。 I consider that each execution needs to load the corresponding pytorch environment, which consumes some initialization time. If some measures can be taken to reduce the time here, the efficiency of using the onnx model for inference will be significantly improved.

The time-consuming of the following codes is also significantly different in the two reasoning methods：

# https://github.com/huggingface/transformers/blob/v4.24.0/src/transformers/generation_logits_process.py
static_bad_words_mask = torch.zeros(scores.shape[1])
static_bad_words_mask[self.bad_words_id_length_1] = 1
return static_bad_words_mask.unsqueeze(0).to(scores.device).bool()

Model loading and inference

pytorch model： model = AutoModelForSeq2SeqLM.from_pretrained("./bin_model) result = model.generate(**model_inputs)
onnx model： model = ORTModelForSeq2SeqLM.from_pretrained(“./onnx_model”, from_transformers=False) onnx_translation = pipeline(“translation_es_to_en”, model=model, tokenizer=tokenizer) result = onnx_translation(inputs)

Machine configuration

36 cores CPU，no GPU

Issue Analytics

State:
Created 10 months ago
Reactions:1
Comments:10 (7 by maintainers)

Top GitHub Comments

1reaction

young-chaocommented, Dec 8, 2022

Awesome thanks! Is it on an AWS EC2 instance? If so could you give me the name so that I can reproduce there?

Sorry, it’s not on an AWS EC2 instance, but on my own machine, so I can’t provide more information.

0reactions

fxmartycommented, Dec 7, 2022

Awesome thanks! Is it on an AWS EC2 instance? If so could you give me the name so that I can reproduce there?

Read more comments on GitHub >

Top Results From Across the Web

Accelerated Inference with Optimum and Transformers Pipelines

Inference has landed in Optimum with support for Hugging Face Transformers pipelines, including text-generation using ONNX Runtime.

NLP Transformers pipelines with ONNX | by Thomas Chaigneau

It is portable, open-source and really awesome to boost inference speed without sacrificing accuracy. I found a lot of articles about ONNX ......

Hugging Face NLP Transformers pipelines with ONNX - GitHub

It is portable, open-source and really awesome to boost inference speed without sacrificing accuracy.

Faster Inference for NLP Pipeline's using Hugging Face ...

Faster Inference -: Optimizing Transformer model with HF and ONNX Runtime. We will be downloading a Pretrained BERT model and converting it to ......

Microsoft open sources breakthrough optimizations for ...

Microsoft has open sourced enhanced versions of transformer inference optimizations into the ONNX Runtime and extended them to work on both ...

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

Issue to use GPT2 ONNX export with past key values

Community contribution - `BetterTransformer` integration for more models!