Dynamic quantized ORTModelForSeq2SeqLM throws error during inference
See original GitHub issueSystem Info
Optimum - 1.4.1
Linux
Python - 3.7
Who can help?
@JingyaHuang @echarlaix I dynamically quantized a model fine-tuned on T5 for the text-to-text generation task, but it was showing error during inference.
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, …) - My own task or dataset (give details below)
Reproduction
I tried to load that quantized model using the following code:
model_ort_q = ORTModelForSeq2SeqLM.from_pretrained("local_folder_path")
First it complained that file was not found. Even If I pass the file_name, it was showing the below error:
NoSuchFile: [ONNXRuntimeError] : 3 : NO_SUCHFILE : Load model from ./ort-quant-fin/encoder_model.onnx failed:Load model ./local_path/encoder_model.onnx failed. File doesn't exist
So I renamed the file by removing the suffix “_quantized” in all the onnx model filenames. Then the model got loaded successfully.
But when I tried to do an inference, I got the below error:
text2text_generator = pipeline("text2text-generation", model=model_ort, tokenizer=tokenizer)
print(text2text_generator("some text"))
[/usr/local/lib/python3.7/dist-packages/optimum/onnxruntime/modeling_seq2seq.py](https://localhost:8080/#) in forward(self, input_ids, attention_mask, **kwargs)
530 ) -> BaseModelOutput:
531
--> 532 onnx_inputs = {"input_ids": input_ids.cpu().detach().numpy()}
533
534 # Add the attention_mask inputs when needed
AttributeError: 'NoneType' object has no attribute 'cpu'
Expected behavior
The quantized model for ORTModelForSeq2SeqLM should generate text during inference
Issue Analytics
- State:
- Created a year ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
Quantized cannot inference with cuda · Issue #74393 - GitHub
I am using latest version of PyTorch, I have considered dynamic quantization. After converting I am trying to do inference of the model....
Read more >Optimum & T5 for inference - Hugging Face Forums
Accelerate inference using static and dynamic quantization with ORTQuantizer! ... Then, I ran the code but it gave an error:
Read more >quantized model to onnx::::: attributeerror: 'nonetype' object ...
huggingface/optimumDynamic quantized ORTModelForSeq2SeqLM throws error ... for the text-to-text generation task, but it was showing error during inference.
Read more >Very high error after full integer quantization of a regression ...
When you are doing quantization (and machine learning in general), you need to be careful at what your data looks like.
Read more >Post-training quantization | TensorFlow Lite
To further reduce latency during inference, "dynamic-range" operators ... Note: The converter will throw an error if it encounters an ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@JingyaHuang Awesome works now
Hi @leslyarun ORTModel needs the model configuration(
config.json
) instead of the configuration of your quantization approach(ort_config.json
). You can save the model config when loading the original model: