question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Dynamic quantized ORTModelForSeq2SeqLM throws error during inference

See original GitHub issue

System Info

Optimum - 1.4.1
Linux
Python - 3.7

Who can help?

@JingyaHuang @echarlaix I dynamically quantized a model fine-tuned on T5 for the text-to-text generation task, but it was showing error during inference.

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, …)
  • My own task or dataset (give details below)

Reproduction

I tried to load that quantized model using the following code:


model_ort_q = ORTModelForSeq2SeqLM.from_pretrained("local_folder_path")

First it complained that file was not found. Even If I pass the file_name, it was showing the below error:


NoSuchFile: [ONNXRuntimeError] : 3 : NO_SUCHFILE : Load model from ./ort-quant-fin/encoder_model.onnx failed:Load model ./local_path/encoder_model.onnx failed. File doesn't exist

So I renamed the file by removing the suffix “_quantized” in all the onnx model filenames. Then the model got loaded successfully.

But when I tried to do an inference, I got the below error:

text2text_generator = pipeline("text2text-generation", model=model_ort, tokenizer=tokenizer)
print(text2text_generator("some text"))


[/usr/local/lib/python3.7/dist-packages/optimum/onnxruntime/modeling_seq2seq.py](https://localhost:8080/#) in forward(self, input_ids, attention_mask, **kwargs)
    530     ) -> BaseModelOutput:
    531 
--> 532         onnx_inputs = {"input_ids": input_ids.cpu().detach().numpy()}
    533 
    534         # Add the attention_mask inputs when needed

AttributeError: 'NoneType' object has no attribute 'cpu'

Expected behavior

The quantized model for ORTModelForSeq2SeqLM should generate text during inference

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
leslyaruncommented, Oct 28, 2022

@JingyaHuang Awesome works now

0reactions
JingyaHuangcommented, Oct 28, 2022

Hi @leslyarun ORTModel needs the model configuration(config.json) instead of the configuration of your quantization approach(ort_config.json). You can save the model config when loading the original model:

model = ORTModelForSeq2SeqLM.from_pretrained(model_id, from_transformers=True)
model.save_pretrained(onnx_path)
Read more comments on GitHub >

github_iconTop Results From Across the Web

Quantized cannot inference with cuda · Issue #74393 - GitHub
I am using latest version of PyTorch, I have considered dynamic quantization. After converting I am trying to do inference of the model....
Read more >
Optimum & T5 for inference - Hugging Face Forums
Accelerate inference using static and dynamic quantization with ORTQuantizer! ... Then, I ran the code but it gave an error:
Read more >
quantized model to onnx::::: attributeerror: 'nonetype' object ...
huggingface/optimumDynamic quantized ORTModelForSeq2SeqLM throws error ... for the text-to-text generation task, but it was showing error during inference.
Read more >
Very high error after full integer quantization of a regression ...
When you are doing quantization (and machine learning in general), you need to be careful at what your data looks like.
Read more >
Post-training quantization | TensorFlow Lite
To further reduce latency during inference, "dynamic-range" operators ... Note: The converter will throw an error if it encounters an ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found