Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error when use optimum.pipeline according to the usage doc

See original GitHub issue

System Info

onnx                     1.12.0
onnxruntime-gpu          1.13.1
optimum                  1.5.0.dev0
transformers             4.24.0

Who can help?

Hi Optimum team, I met this error when use pipeline to inference with GPU device based on the official usage documentation (https://github.com/huggingface/optimum/blob/main/docs/source/onnxruntime/usage_guides/pipelines.mdx#optimizing-with-ortoptimizer) . Could you help to check, thanks!

@philschmid @JingyaHuang @echarlaix FYI, I saw the error stack includes code in optimum/onnxruntime/modeling_ort.py . Think it might related to the recent IO binding enhancement.

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

I was following the usage documentation (https://github.com/huggingface/optimum/blob/main/docs/source/onnxruntime/usage_guides/pipelines.mdx#optimizing-with-ortoptimizer) to do some validation. And I met ValueError: output name logits not found error when use pipeline with

task: question-answering
device: GPU
model: deepset/roberta-large-squad2

Steps:

Good with CPU/GPU on model distilbert-base-uncased-finetuned-sst-2-english, task text-classification

from optimum.onnxruntime import ORTModelForSequenceClassification, ORTOptimizer
from optimum.onnxruntime.configuration import OptimizationConfig
from optimum.pipelines import pipeline
from transformers import AutoTokenizer

# Load the tokenizer and export the model to the ONNX format
model_id = "distilbert-base-uncased-finetuned-sst-2-english"
save_dir = "tmp/onnx/"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = ORTModelForSequenceClassification.from_pretrained(model_id, from_transformers=True)

# Load the optimization configuration detailing the optimization we wish to apply
optimization_config = OptimizationConfig(optimization_level=1)
optimizer = ORTOptimizer.from_pretrained(model)
# Apply optimization and save the resulting model
optimizer.optimize(save_dir=save_dir, optimization_config=optimization_config)

# Load the optimized model from a local repository
# CPU: OK
# model = ORTModelForSequenceClassification.from_pretrained(save_dir, file_name="model_optimized.onnx")
# GPU: OK
import torch
model = ORTModelForSequenceClassification.from_pretrained(save_dir, file_name="model_optimized.onnx").to(torch.device('cuda:0'))

# Create the transformers pipeline
onnx_clx = pipeline("text-classification", model=model, tokenizer=tokenizer)
text = "I like the new ORT pipeline"
pred = onnx_clx(text)
print(pred)

# # Save and push the model to the hub
# tokenizer.save_pretrained("new_path_for_directory")
# model.save_pretrained("new_path_for_directory")
# model.push_to_hub("new_path_for_directory", repository_id="my-onnx-repo", use_auth_token=True)

Error when run with GPU on model deepset/roberta-large-squad2, task question-answering

from optimum.onnxruntime import ORTModelForQuestionAnswering, ORTOptimizer
from optimum.onnxruntime.configuration import OptimizationConfig
from optimum.pipelines import pipeline
from transformers import AutoTokenizer

# Load the tokenizer and export the model to the ONNX format
model_id = "deepset/roberta-large-squad2"
save_dir = "tmp/onnx/"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = ORTModelForQuestionAnswering.from_pretrained(model_id, from_transformers=True)

# Load the optimization configuration detailing the optimization we wish to apply
optimization_config = OptimizationConfig(optimization_level=1)
optimizer = ORTOptimizer.from_pretrained(model)
# Apply optimization and save the resulting model
optimizer.optimize(save_dir=save_dir, optimization_config=optimization_config)

# Load the optimized model from a local repository
# CPU: OK
# model = ORTModelForQuestionAnswering.from_pretrained(save_dir, file_name="model_optimized.onnx")
# GPU: ERROR
import torch
model = ORTModelForQuestionAnswering.from_pretrained(save_dir, file_name="model_optimized.onnx").to(torch.device('cuda:0'))

# Create the transformers pipeline
onnx_clx = pipeline("question-answering", model=model, tokenizer=tokenizer)
QA_input = {
    'question': 'Why is model conversion important?',
    'context': 'The option to convert models between FARM and transformers gives freedom to the user and let people easily switch between frameworks.'
}
pred = onnx_clx(QA_input)
print(pred)

# # Save and push the model to the hub
# tokenizer.save_pretrained("new_path_for_directory")
# model.save_pretrained("new_path_for_directory")
# model.push_to_hub("new_path_for_directory", repository_id="my-onnx-repo", use_auth_token=True)

error information:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/transformers/pipelines/question_answering.py", line 380, in __call__
    return super().__call__(examples[0], **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/pipelines/base.py", line 1074, in __call__
    return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
  File "/usr/local/lib/python3.8/dist-packages/transformers/pipelines/base.py", line 1096, in run_single
    model_outputs = self.forward(model_inputs, **forward_params)
  File "/usr/local/lib/python3.8/dist-packages/transformers/pipelines/base.py", line 990, in forward
    model_outputs = self._forward(model_inputs, **forward_params)
  File "/usr/local/lib/python3.8/dist-packages/transformers/pipelines/question_answering.py", line 500, in _forward
    start, end = self.model(**model_inputs)[:2]
  File "/usr/local/lib/python3.8/dist-packages/optimum/modeling_base.py", line 60, in __call__
    return self.forward(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/optimum/onnxruntime/modeling_ort.py", line 736, in forward
    io_binding, output_shapes, output_buffers = self.prepare_io_binding(
  File "/usr/local/lib/python3.8/dist-packages/optimum/onnxruntime/modeling_ort.py", line 693, in prepare_io_binding
    start_logits_shape, start_logits_buffer = self.prepare_logits_buffer(
  File "/usr/local/lib/python3.8/dist-packages/optimum/onnxruntime/modeling_ort.py", line 646, in prepare_logits_buffer
    ort_type = TypeHelper.get_output_type(self.model, "logits")
  File "/usr/local/lib/python3.8/dist-packages/onnxruntime/transformers/io_binding_helper.py", line 26, in get_output_type
    raise ValueError(f"output name {name} not found")
ValueError: output name logits not found

Expected behavior

We expected that the result of step-2 can show like below without error:

Actual CPU result (Expected GPU result):

{'score': 0.30786821246147156, 'start': 59, 'end': 132, 'answer': 'gives freedom to the user and let people easily switch between frameworks'}

Issue Analytics

State:
Created 10 months ago
Reactions:1
Comments:6 (5 by maintainers)

Top GitHub Comments

1reaction

JingyaHuangcommented, Nov 11, 2022

Hi @bugzyz,

The fix for QA pipeline has been merged to main. Yeah feel free to test the dev version with the command.

1reaction

fxmartycommented, Nov 10, 2022

Fixed in https://github.com/huggingface/optimum/pull/454

@JingyaHuang We should merge imo

Top Results From Across the Web

Accelerated Inference with Optimum and Transformers Pipelines

Inference has landed in Optimum with support for Hugging Face Transformers pipelines, including text-generation using ONNX Runtime.

Selecting the Optimum Pipe Size - PDH Online

It is common to identify pipes by inches using NPS or Nominal Pipe Size. Fortunately pipe size designation has been standardized. It is...

49 CFR Part 195 -- Transportation of Hazardous Liquids by ...

(5) Establish the maximum operating pressure of the pipeline according to § 195.406 ... with any other edition of that document is qualified...

Hydrogen Blending Impacts Study - Online Documents

Larger concentration of hydrogen may present significant challenges to the operation of the natural gas infrastructure, since the energy content ...

86 Pipeline Manager Framework Modules - Oracle Help Center

This chapter provides reference information for Oracle Communications Billing and Revenue Management (BRM) Pipeline Manager framework modules. Controller. Use ...