Error when use optimum.pipeline according to the usage doc
See original GitHub issueSystem Info
onnx 1.12.0
onnxruntime-gpu 1.13.1
optimum 1.5.0.dev0
transformers 4.24.0
Who can help?
Hi Optimum team, I met this error when use pipeline to inference with GPU device based on the official usage documentation (https://github.com/huggingface/optimum/blob/main/docs/source/onnxruntime/usage_guides/pipelines.mdx#optimizing-with-ortoptimizer) . Could you help to check, thanks!
@philschmid @JingyaHuang @echarlaix FYI, I saw the error stack includes code in optimum/onnxruntime/modeling_ort.py . Think it might related to the recent IO binding enhancement.
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, …) - My own task or dataset (give details below)
Reproduction
I was following the usage documentation (https://github.com/huggingface/optimum/blob/main/docs/source/onnxruntime/usage_guides/pipelines.mdx#optimizing-with-ortoptimizer) to do some validation. And I met ValueError: output name logits not found
error when use pipeline with
- task: question-answering
- device: GPU
- model: deepset/roberta-large-squad2
Steps:
- Good with CPU/GPU on model
distilbert-base-uncased-finetuned-sst-2-english
, tasktext-classification
from optimum.onnxruntime import ORTModelForSequenceClassification, ORTOptimizer
from optimum.onnxruntime.configuration import OptimizationConfig
from optimum.pipelines import pipeline
from transformers import AutoTokenizer
# Load the tokenizer and export the model to the ONNX format
model_id = "distilbert-base-uncased-finetuned-sst-2-english"
save_dir = "tmp/onnx/"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = ORTModelForSequenceClassification.from_pretrained(model_id, from_transformers=True)
# Load the optimization configuration detailing the optimization we wish to apply
optimization_config = OptimizationConfig(optimization_level=1)
optimizer = ORTOptimizer.from_pretrained(model)
# Apply optimization and save the resulting model
optimizer.optimize(save_dir=save_dir, optimization_config=optimization_config)
# Load the optimized model from a local repository
# CPU: OK
# model = ORTModelForSequenceClassification.from_pretrained(save_dir, file_name="model_optimized.onnx")
# GPU: OK
import torch
model = ORTModelForSequenceClassification.from_pretrained(save_dir, file_name="model_optimized.onnx").to(torch.device('cuda:0'))
# Create the transformers pipeline
onnx_clx = pipeline("text-classification", model=model, tokenizer=tokenizer)
text = "I like the new ORT pipeline"
pred = onnx_clx(text)
print(pred)
# # Save and push the model to the hub
# tokenizer.save_pretrained("new_path_for_directory")
# model.save_pretrained("new_path_for_directory")
# model.push_to_hub("new_path_for_directory", repository_id="my-onnx-repo", use_auth_token=True)
- Error when run with GPU on model
deepset/roberta-large-squad2
, taskquestion-answering
from optimum.onnxruntime import ORTModelForQuestionAnswering, ORTOptimizer
from optimum.onnxruntime.configuration import OptimizationConfig
from optimum.pipelines import pipeline
from transformers import AutoTokenizer
# Load the tokenizer and export the model to the ONNX format
model_id = "deepset/roberta-large-squad2"
save_dir = "tmp/onnx/"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = ORTModelForQuestionAnswering.from_pretrained(model_id, from_transformers=True)
# Load the optimization configuration detailing the optimization we wish to apply
optimization_config = OptimizationConfig(optimization_level=1)
optimizer = ORTOptimizer.from_pretrained(model)
# Apply optimization and save the resulting model
optimizer.optimize(save_dir=save_dir, optimization_config=optimization_config)
# Load the optimized model from a local repository
# CPU: OK
# model = ORTModelForQuestionAnswering.from_pretrained(save_dir, file_name="model_optimized.onnx")
# GPU: ERROR
import torch
model = ORTModelForQuestionAnswering.from_pretrained(save_dir, file_name="model_optimized.onnx").to(torch.device('cuda:0'))
# Create the transformers pipeline
onnx_clx = pipeline("question-answering", model=model, tokenizer=tokenizer)
QA_input = {
'question': 'Why is model conversion important?',
'context': 'The option to convert models between FARM and transformers gives freedom to the user and let people easily switch between frameworks.'
}
pred = onnx_clx(QA_input)
print(pred)
# # Save and push the model to the hub
# tokenizer.save_pretrained("new_path_for_directory")
# model.save_pretrained("new_path_for_directory")
# model.push_to_hub("new_path_for_directory", repository_id="my-onnx-repo", use_auth_token=True)
error information:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.8/dist-packages/transformers/pipelines/question_answering.py", line 380, in __call__
return super().__call__(examples[0], **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/pipelines/base.py", line 1074, in __call__
return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
File "/usr/local/lib/python3.8/dist-packages/transformers/pipelines/base.py", line 1096, in run_single
model_outputs = self.forward(model_inputs, **forward_params)
File "/usr/local/lib/python3.8/dist-packages/transformers/pipelines/base.py", line 990, in forward
model_outputs = self._forward(model_inputs, **forward_params)
File "/usr/local/lib/python3.8/dist-packages/transformers/pipelines/question_answering.py", line 500, in _forward
start, end = self.model(**model_inputs)[:2]
File "/usr/local/lib/python3.8/dist-packages/optimum/modeling_base.py", line 60, in __call__
return self.forward(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/optimum/onnxruntime/modeling_ort.py", line 736, in forward
io_binding, output_shapes, output_buffers = self.prepare_io_binding(
File "/usr/local/lib/python3.8/dist-packages/optimum/onnxruntime/modeling_ort.py", line 693, in prepare_io_binding
start_logits_shape, start_logits_buffer = self.prepare_logits_buffer(
File "/usr/local/lib/python3.8/dist-packages/optimum/onnxruntime/modeling_ort.py", line 646, in prepare_logits_buffer
ort_type = TypeHelper.get_output_type(self.model, "logits")
File "/usr/local/lib/python3.8/dist-packages/onnxruntime/transformers/io_binding_helper.py", line 26, in get_output_type
raise ValueError(f"output name {name} not found")
ValueError: output name logits not found
Expected behavior
We expected that the result of step-2 can show like below without error:
Actual CPU result (Expected GPU result):
{'score': 0.30786821246147156, 'start': 59, 'end': 132, 'answer': 'gives freedom to the user and let people easily switch between frameworks'}
Issue Analytics
- State:
- Created 10 months ago
- Reactions:1
- Comments:6 (5 by maintainers)
Top GitHub Comments
Hi @bugzyz,
The fix for QA pipeline has been merged to main. Yeah feel free to test the dev version with the command.
Fixed in https://github.com/huggingface/optimum/pull/454
@JingyaHuang We should merge imo