Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Dynamic quantized ORTModelForSeq2SeqLM throws error during inference

See original GitHub issue

System Info

Optimum - 1.4.1
Linux
Python - 3.7

Who can help?

@JingyaHuang @echarlaix I dynamically quantized a model fine-tuned on T5 for the text-to-text generation task, but it was showing error during inference.

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

I tried to load that quantized model using the following code:


model_ort_q = ORTModelForSeq2SeqLM.from_pretrained("local_folder_path")

First it complained that file was not found. Even If I pass the file_name, it was showing the below error:


NoSuchFile: [ONNXRuntimeError] : 3 : NO_SUCHFILE : Load model from ./ort-quant-fin/encoder_model.onnx failed:Load model ./local_path/encoder_model.onnx failed. File doesn't exist

So I renamed the file by removing the suffix “_quantized” in all the onnx model filenames. Then the model got loaded successfully.

But when I tried to do an inference, I got the below error:

text2text_generator = pipeline("text2text-generation", model=model_ort, tokenizer=tokenizer)
print(text2text_generator("some text"))


[/usr/local/lib/python3.7/dist-packages/optimum/onnxruntime/modeling_seq2seq.py](https://localhost:8080/#) in forward(self, input_ids, attention_mask, **kwargs)
    530     ) -> BaseModelOutput:
    531 
--> 532         onnx_inputs = {"input_ids": input_ids.cpu().detach().numpy()}
    533 
    534         # Add the attention_mask inputs when needed

AttributeError: 'NoneType' object has no attribute 'cpu'

Expected behavior

The quantized model for ORTModelForSeq2SeqLM should generate text during inference

Issue Analytics

State:
Created a year ago
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

leslyaruncommented, Oct 28, 2022

@JingyaHuang Awesome works now

0reactions

JingyaHuangcommented, Oct 28, 2022

Hi @leslyarun ORTModel needs the model configuration(config.json) instead of the configuration of your quantization approach(ort_config.json). You can save the model config when loading the original model:

model = ORTModelForSeq2SeqLM.from_pretrained(model_id, from_transformers=True)
model.save_pretrained(onnx_path)

Top Results From Across the Web

Quantized cannot inference with cuda · Issue #74393 - GitHub

I am using latest version of PyTorch, I have considered dynamic quantization. After converting I am trying to do inference of the model....

Optimum & T5 for inference - Hugging Face Forums

Accelerate inference using static and dynamic quantization with ORTQuantizer! ... Then, I ran the code but it gave an error:

quantized model to onnx::::: attributeerror: 'nonetype' object ...

huggingface/optimumDynamic quantized ORTModelForSeq2SeqLM throws error ... for the text-to-text generation task, but it was showing error during inference.

Very high error after full integer quantization of a regression ...

When you are doing quantization (and machine learning in general), you need to be careful at what your data looks like.

Post-training quantization | TensorFlow Lite

To further reduce latency during inference, "dynamic-range" operators ... Note: The converter will throw an error if it encounters an ...