question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Possibility to speed up inference of onnx models with transformers.pipeline

See original GitHub issue

Problem Description

Based on the model named “Helsinki-NLP/opus-mt-es-en”, I investigated the time-consuming composition of using the onnx model and the pytorch model for inference, and found that the main time difference lies in a small step in beam search: scores = scores.masked_fill(banned_mask, -float(“inf”)) When I use the pytorch model for inference, this line of code only consumes 0.10ms per execution, while using the onnx model consumes close to 10ms。 I consider that each execution needs to load the corresponding pytorch environment, which consumes some initialization time. If some measures can be taken to reduce the time here, the efficiency of using the onnx model for inference will be significantly improved.

The time-consuming of the following codes is also significantly different in the two reasoning methods:

# https://github.com/huggingface/transformers/blob/v4.24.0/src/transformers/generation_logits_process.py
static_bad_words_mask = torch.zeros(scores.shape[1])
static_bad_words_mask[self.bad_words_id_length_1] = 1
return static_bad_words_mask.unsqueeze(0).to(scores.device).bool()

Model loading and inference

  1. pytorch model: model = AutoModelForSeq2SeqLM.from_pretrained("./bin_model) result = model.generate(**model_inputs)
  2. onnx model: model = ORTModelForSeq2SeqLM.from_pretrained(“./onnx_model”, from_transformers=False) onnx_translation = pipeline(“translation_es_to_en”, model=model, tokenizer=tokenizer) result = onnx_translation(inputs)

Machine configuration

36 cores CPU,no GPU

Issue Analytics

  • State:open
  • Created 10 months ago
  • Reactions:1
  • Comments:10 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
young-chaocommented, Dec 8, 2022

Awesome thanks! Is it on an AWS EC2 instance? If so could you give me the name so that I can reproduce there?

Sorry, it’s not on an AWS EC2 instance, but on my own machine, so I can’t provide more information.

0reactions
fxmartycommented, Dec 7, 2022

Awesome thanks! Is it on an AWS EC2 instance? If so could you give me the name so that I can reproduce there?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Accelerated Inference with Optimum and Transformers Pipelines
Inference has landed in Optimum with support for Hugging Face Transformers pipelines, including text-generation using ONNX Runtime.
Read more >
NLP Transformers pipelines with ONNX | by Thomas Chaigneau
It is portable, open-source and really awesome to boost inference speed without sacrificing accuracy. I found a lot of articles about ONNX ......
Read more >
Hugging Face NLP Transformers pipelines with ONNX - GitHub
It is portable, open-source and really awesome to boost inference speed without sacrificing accuracy.
Read more >
Faster Inference for NLP Pipeline's using Hugging Face ...
Faster Inference -: Optimizing Transformer model with HF and ONNX Runtime. We will be downloading a Pretrained BERT model and converting it to ......
Read more >
Microsoft open sources breakthrough optimizations for ...
Microsoft has open sourced enhanced versions of transformer inference optimizations into the ONNX Runtime and extended them to work on both ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found