Unable to use GPU accelerated Optimum Onnx transformer model for inference
See original GitHub issueSystem Info
Optimum Version: 1.5.0
Ubuntu 20.04 Linux
Python version 3.8
Who can help?
@JingyaHuang @echarlaix When following the documentation on https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/gpu for 1.5.0 version optimum. We get the following error:
RuntimeError Traceback (most recent call last) <ipython-input-7-8429fcab1e09> in <module> 19 “education”, 20 “music”] —> 21 pred = onnx_z0(sequence_to_classify, candidate_labels, multi_class=False)
8 frames /usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py in bind_input(self, name, device_type, device_id, element_type, shape, buffer_ptr) 454 :param buffer_ptr: memory pointer to input data 455 “”" –> 456 self._iobinding.bind_input( 457 name, 458 C.OrtDevice(
RuntimeError: Error when binding input: There’s no data transfer registered for copying tensors from Device:[DeviceType:1 MemoryType:0 DeviceId:0] to Device:[DeviceType:0 MemoryType:0 DeviceId:0]
This is reproducible on google colab gpu instance as well. This is observed from 1.5.0 version only and 1.4.1 works as expected.
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, …) - My own task or dataset (give details below)
Reproduction
!pip install optimum[onnxruntime-gpu]==1.5.1 !pip install transformers onnx
from optimum.onnxruntime import ORTModelForSequenceClassification
ort_model = ORTModelForSequenceClassification.from_pretrained( “philschmid/tiny-bert-sst2-distilled”, from_transformers=True, provider=“CUDAExecutionProvider”, )
from optimum.pipelines import pipeline from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(“philschmid/tiny-bert-sst2-distilled”)
pipe = pipeline(task=“text-classification”, model=ort_model, tokenizer=tokenizer) result = pipe(“Both the music and visual were astounding, not to mention the actors performance.”) print(result)
Expected behavior
Inference fails due to device error, which is not expected.
Issue Analytics
- State:
- Created 9 months ago
- Comments:11 (7 by maintainers)
Top GitHub Comments
For sure, thanks a lot! Don’t hesitate if you need any guidance!
@smiraldr So as I understand in fact it was a device indexing issue, @JingyaHuang fixed it in https://github.com/huggingface/optimum/pull/613 . So your PR looks good as is, moving the discussion there!