Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error: Expected shape from model of {} does not match actual shape of {1,1,1} for output

See original GitHub issue

Problem

I’m getting the following error when I’m trying to apply static quantization (ONNX) with the ORTQuantizer .

Tests

This error occurs for:

my custom script
the example code in the README.md
The example [notebook] in this repository (https://github.com/huggingface/notebooks/blob/master/examples/text_classification_quantization_ort.ipynb)
a brand new project with only transformers, datasets and optimum[onnxruntime] installed
a brand new project with only transformers, datasets and optimum[onnxruntime] (with python -m pip install git+https://github.com/huggingface/optimum.git installed)

The resulting model-quantized.onnx can be loaded but produces very bad results.
dynamic quantization works seamlessy
using:
- Python 3.9
- tested on two different devices with different operating systems:
  - MacOS Monterey (with Intel)
  - WSL for Windows 11 (Ubuntu)

Issue Analytics

State:
Created 2 years ago
Reactions:6
Comments:6 (1 by maintainers)

Top GitHub Comments

3reactions

echarlaixcommented, Mar 2, 2022

Hi @realjanpaulus,

Thanks for sharing your experiments results, analyzing which part of the model is sensitive to quantization is a very interesting topic.

Is the rule here, the more data, the more precise? Or does a small dataset handle the job quite well (or even something like dataset = Dataset.from_dict({“context”: [“This is a context”, “This is another”]}))

I would tend to say that in general the more calibration data you provide, the more confident we can be in the estimated quantization parameters. This is however not true for calibration methods such as minmax which takes the global minimum and maximum values (there is currently not the option to compute those values using an exponential moving average). This results in an increase of the quantization range, leading to a decrease in precision and very likely a drop in the final model’s performance. Even though this should not be true for every model / task / calibration method combination, I found that for BERT models on text classification tasks, 40 to 50 examples were giving good results when using the minmax calibration method.

I hope this helps !

I will close this issue as the initial problem is now solved, if you have other questions please feel free to open an other one.

2reactions

mfuntowiczcommented, Mar 2, 2022

Hi @realjanpaulus,

We reported the issue to ORT folks and should be fixed in next release 👍🏻.

In the meantime I confirm it doesn’t impact final performances of the model:

↪️ https://github.com/microsoft/onnxruntime/issues/10504

Regarding the second, currently histogram based methods are failling for some parameters combinaisons, it should be fixed also in the next release, PR has been merged upstream:

➡️ https://github.com/microsoft/onnxruntime/issues/10571