question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unable to use GPU accelerated Optimum Onnx transformer model for inference

See original GitHub issue

System Info

Optimum Version: 1.5.0
Ubuntu 20.04 Linux 
Python version 3.8

Who can help?

@JingyaHuang @echarlaix When following the documentation on https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/gpu for 1.5.0 version optimum. We get the following error:


RuntimeError Traceback (most recent call last) <ipython-input-7-8429fcab1e09> in <module> 19 “education”, 20 “music”] —> 21 pred = onnx_z0(sequence_to_classify, candidate_labels, multi_class=False)

8 frames /usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py in bind_input(self, name, device_type, device_id, element_type, shape, buffer_ptr) 454 :param buffer_ptr: memory pointer to input data 455 “”" –> 456 self._iobinding.bind_input( 457 name, 458 C.OrtDevice(

RuntimeError: Error when binding input: There’s no data transfer registered for copying tensors from Device:[DeviceType:1 MemoryType:0 DeviceId:0] to Device:[DeviceType:0 MemoryType:0 DeviceId:0]

This is reproducible on google colab gpu instance as well. This is observed from 1.5.0 version only and 1.4.1 works as expected.

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, …)
  • My own task or dataset (give details below)

Reproduction

!pip install optimum[onnxruntime-gpu]==1.5.1 !pip install transformers onnx

from optimum.onnxruntime import ORTModelForSequenceClassification

ort_model = ORTModelForSequenceClassification.from_pretrained( “philschmid/tiny-bert-sst2-distilled”, from_transformers=True, provider=“CUDAExecutionProvider”, )

from optimum.pipelines import pipeline from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(“philschmid/tiny-bert-sst2-distilled”)

pipe = pipeline(task=“text-classification”, model=ort_model, tokenizer=tokenizer) result = pipe(“Both the music and visual were astounding, not to mention the actors performance.”) print(result)

Expected behavior

Inference fails due to device error, which is not expected.

Issue Analytics

  • State:closed
  • Created 9 months ago
  • Comments:11 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
fxmartycommented, Dec 13, 2022

For sure, thanks a lot! Don’t hesitate if you need any guidance!

0reactions
fxmartycommented, Dec 20, 2022

@smiraldr So as I understand in fact it was a device indexing issue, @JingyaHuang fixed it in https://github.com/huggingface/optimum/pull/613 . So your PR looks good as is, moving the discussion there!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Accelerated inference on NVIDIA GPUs - Hugging Face
Accelerated inference on NVIDIA GPUs. By default, ONNX Runtime runs inference on CPU devices. ... Use CUDA execution provider with floating-point models.
Read more >
Inference performance drop 22X on GPU hardware ... - GitHub
We expected that the performance results are closed between the transformer backend and optimum[onnxruntime-gpu] backend. But it turns out that optimum is 22X ......
Read more >
Accelerate Transformer inference on CPU with Optimum and ...
In this video, I show you how to accelerate Transformer inference with Optimum, an open source library by Hugging Face, and ONNX.
Read more >
Optimizing Transformers with Hugging Face Optimum
Apply graph optimization techniques to the ONNX model; Apply dynamic quantization using ORTQuantizer from Optimum; Test inference with the ...
Read more >
Deploying GPT-J and T5 with NVIDIA Triton Inference Server
For an introduction to the FasterTransformer library (Part 1), see Accelerated Inference for Large Transformer Models Using NVIDIA Triton ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found