question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error when use optimum.pipeline according to the usage doc

See original GitHub issue

System Info

onnx                     1.12.0
onnxruntime-gpu          1.13.1
optimum                  1.5.0.dev0
transformers             4.24.0

Who can help?

Hi Optimum team, I met this error when use pipeline to inference with GPU device based on the official usage documentation (https://github.com/huggingface/optimum/blob/main/docs/source/onnxruntime/usage_guides/pipelines.mdx#optimizing-with-ortoptimizer) . Could you help to check, thanks!

@philschmid @JingyaHuang @echarlaix FYI, I saw the error stack includes code in optimum/onnxruntime/modeling_ort.py . Think it might related to the recent IO binding enhancement.

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, …)
  • My own task or dataset (give details below)

Reproduction

I was following the usage documentation (https://github.com/huggingface/optimum/blob/main/docs/source/onnxruntime/usage_guides/pipelines.mdx#optimizing-with-ortoptimizer) to do some validation. And I met ValueError: output name logits not found error when use pipeline with

  1. task: question-answering
  2. device: GPU
  3. model: deepset/roberta-large-squad2

Steps:

  1. Good with CPU/GPU on model distilbert-base-uncased-finetuned-sst-2-english, task text-classification
from optimum.onnxruntime import ORTModelForSequenceClassification, ORTOptimizer
from optimum.onnxruntime.configuration import OptimizationConfig
from optimum.pipelines import pipeline
from transformers import AutoTokenizer

# Load the tokenizer and export the model to the ONNX format
model_id = "distilbert-base-uncased-finetuned-sst-2-english"
save_dir = "tmp/onnx/"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = ORTModelForSequenceClassification.from_pretrained(model_id, from_transformers=True)

# Load the optimization configuration detailing the optimization we wish to apply
optimization_config = OptimizationConfig(optimization_level=1)
optimizer = ORTOptimizer.from_pretrained(model)
# Apply optimization and save the resulting model
optimizer.optimize(save_dir=save_dir, optimization_config=optimization_config)

# Load the optimized model from a local repository
# CPU: OK
# model = ORTModelForSequenceClassification.from_pretrained(save_dir, file_name="model_optimized.onnx")
# GPU: OK
import torch
model = ORTModelForSequenceClassification.from_pretrained(save_dir, file_name="model_optimized.onnx").to(torch.device('cuda:0'))

# Create the transformers pipeline
onnx_clx = pipeline("text-classification", model=model, tokenizer=tokenizer)
text = "I like the new ORT pipeline"
pred = onnx_clx(text)
print(pred)

# # Save and push the model to the hub
# tokenizer.save_pretrained("new_path_for_directory")
# model.save_pretrained("new_path_for_directory")
# model.push_to_hub("new_path_for_directory", repository_id="my-onnx-repo", use_auth_token=True)
  1. Error when run with GPU on model deepset/roberta-large-squad2, task question-answering
from optimum.onnxruntime import ORTModelForQuestionAnswering, ORTOptimizer
from optimum.onnxruntime.configuration import OptimizationConfig
from optimum.pipelines import pipeline
from transformers import AutoTokenizer

# Load the tokenizer and export the model to the ONNX format
model_id = "deepset/roberta-large-squad2"
save_dir = "tmp/onnx/"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = ORTModelForQuestionAnswering.from_pretrained(model_id, from_transformers=True)

# Load the optimization configuration detailing the optimization we wish to apply
optimization_config = OptimizationConfig(optimization_level=1)
optimizer = ORTOptimizer.from_pretrained(model)
# Apply optimization and save the resulting model
optimizer.optimize(save_dir=save_dir, optimization_config=optimization_config)

# Load the optimized model from a local repository
# CPU: OK
# model = ORTModelForQuestionAnswering.from_pretrained(save_dir, file_name="model_optimized.onnx")
# GPU: ERROR
import torch
model = ORTModelForQuestionAnswering.from_pretrained(save_dir, file_name="model_optimized.onnx").to(torch.device('cuda:0'))

# Create the transformers pipeline
onnx_clx = pipeline("question-answering", model=model, tokenizer=tokenizer)
QA_input = {
    'question': 'Why is model conversion important?',
    'context': 'The option to convert models between FARM and transformers gives freedom to the user and let people easily switch between frameworks.'
}
pred = onnx_clx(QA_input)
print(pred)

# # Save and push the model to the hub
# tokenizer.save_pretrained("new_path_for_directory")
# model.save_pretrained("new_path_for_directory")
# model.push_to_hub("new_path_for_directory", repository_id="my-onnx-repo", use_auth_token=True)

error information:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/transformers/pipelines/question_answering.py", line 380, in __call__
    return super().__call__(examples[0], **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/pipelines/base.py", line 1074, in __call__
    return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
  File "/usr/local/lib/python3.8/dist-packages/transformers/pipelines/base.py", line 1096, in run_single
    model_outputs = self.forward(model_inputs, **forward_params)
  File "/usr/local/lib/python3.8/dist-packages/transformers/pipelines/base.py", line 990, in forward
    model_outputs = self._forward(model_inputs, **forward_params)
  File "/usr/local/lib/python3.8/dist-packages/transformers/pipelines/question_answering.py", line 500, in _forward
    start, end = self.model(**model_inputs)[:2]
  File "/usr/local/lib/python3.8/dist-packages/optimum/modeling_base.py", line 60, in __call__
    return self.forward(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/optimum/onnxruntime/modeling_ort.py", line 736, in forward
    io_binding, output_shapes, output_buffers = self.prepare_io_binding(
  File "/usr/local/lib/python3.8/dist-packages/optimum/onnxruntime/modeling_ort.py", line 693, in prepare_io_binding
    start_logits_shape, start_logits_buffer = self.prepare_logits_buffer(
  File "/usr/local/lib/python3.8/dist-packages/optimum/onnxruntime/modeling_ort.py", line 646, in prepare_logits_buffer
    ort_type = TypeHelper.get_output_type(self.model, "logits")
  File "/usr/local/lib/python3.8/dist-packages/onnxruntime/transformers/io_binding_helper.py", line 26, in get_output_type
    raise ValueError(f"output name {name} not found")
ValueError: output name logits not found

Expected behavior

We expected that the result of step-2 can show like below without error:

Actual CPU result (Expected GPU result):

{'score': 0.30786821246147156, 'start': 59, 'end': 132, 'answer': 'gives freedom to the user and let people easily switch between frameworks'}

Issue Analytics

  • State:closed
  • Created 10 months ago
  • Reactions:1
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
JingyaHuangcommented, Nov 11, 2022

Hi @bugzyz,

The fix for QA pipeline has been merged to main. Yeah feel free to test the dev version with the command.

1reaction
fxmartycommented, Nov 10, 2022
Read more comments on GitHub >

github_iconTop Results From Across the Web

Accelerated Inference with Optimum and Transformers Pipelines
Inference has landed in Optimum with support for Hugging Face Transformers pipelines, including text-generation using ONNX Runtime.
Read more >
Selecting the Optimum Pipe Size - PDH Online
It is common to identify pipes by inches using NPS or Nominal Pipe Size. Fortunately pipe size designation has been standardized. It is...
Read more >
49 CFR Part 195 -- Transportation of Hazardous Liquids by ...
(5) Establish the maximum operating pressure of the pipeline according to § 195.406 ... with any other edition of that document is qualified...
Read more >
Hydrogen Blending Impacts Study - Online Documents
Larger concentration of hydrogen may present significant challenges to the operation of the natural gas infrastructure, since the energy content ...
Read more >
86 Pipeline Manager Framework Modules - Oracle Help Center
This chapter provides reference information for Oracle Communications Billing and Revenue Management (BRM) Pipeline Manager framework modules. Controller. Use ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found