Cannot export Donut models to ONNX
See original GitHub issueSystem Info
transformers
version: 4.24.0.dev0- Platform: macOS-10.16-x86_64-i386-64bit
- Python version: 3.8.13
- Huggingface_hub version: 0.10.0
- PyTorch version (GPU?): 1.12.1 (False)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, β¦) - My own task or dataset (give details below)
Reproduction
It seems that the default tolerance of 1e-5 in the ONNX configuration for vision-encoder-decoder models is too small for Donut checkpoints (currently seeing 5e-3 - 9e-3 is needed). As a result, many (all?) Donut checkpoints canβt be exported using the default values in the CLI.
Having said that, the relatively large discrepancy in the exported models suggests there is a deeper issue involved with tracing these models and it would be great to eliminate this potential source of error before increasing the default value for atol
.
Steps to reproduce:
- Pick one of the Donut checkpoints from the
naver-clover-ix
org on the Hub - Export the model using the ONNX CLI, e.g.
python -m transformers.onnx --model=naver-clova-ix/donut-base-finetuned-docvqa --feature=vision2seq-lm onnx/
- The above gives the following error:
ValueError: Outputs values doesn't match between reference model and ONNX exported model: Got max absolute difference of: 0.0091094970703125 for [ -0.6990948 -49.217014 3.7758636 ... 3.2241364 2.7353969
-51.43289 ] vs [ -0.6989002 -49.215897 3.7760048 ... 3.223978 2.7355423
-51.433964 ]
Full stack trace
Framework not requested. Using torch to export to ONNX.
Downloading: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4.74k/4.74k [00:00<00:00, 791kB/s]
Downloading: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 803M/803M [00:09<00:00, 81.2MB/s]
/Users/lewtun/miniconda3/envs/transformers/lib/python3.8/site-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:2895.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Downloading: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 363/363 [00:00<00:00, 85.0kB/s]
Using framework PyTorch: 1.12.1
/Users/lewtun/git/hf/transformers/src/transformers/models/donut/modeling_donut_swin.py:230: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if num_channels != self.num_channels:
/Users/lewtun/git/hf/transformers/src/transformers/models/donut/modeling_donut_swin.py:220: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if width % self.patch_size[1] != 0:
/Users/lewtun/git/hf/transformers/src/transformers/models/donut/modeling_donut_swin.py:223: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if height % self.patch_size[0] != 0:
/Users/lewtun/git/hf/transformers/src/transformers/models/donut/modeling_donut_swin.py:536: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if min(input_resolution) <= self.window_size:
/Users/lewtun/git/hf/transformers/src/transformers/models/donut/modeling_donut_swin.py:136: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
batch_size, height // window_size, window_size, width // window_size, window_size, num_channels
/Users/lewtun/git/hf/transformers/src/transformers/models/donut/modeling_donut_swin.py:148: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
windows = windows.view(-1, height // window_size, width // window_size, window_size, window_size, num_channels)
/Users/lewtun/git/hf/transformers/src/transformers/models/donut/modeling_donut_swin.py:622: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
was_padded = pad_values[3] > 0 or pad_values[5] > 0
/Users/lewtun/git/hf/transformers/src/transformers/models/donut/modeling_donut_swin.py:623: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if was_padded:
/Users/lewtun/git/hf/transformers/src/transformers/models/donut/modeling_donut_swin.py:411: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
batch_size // mask_shape, mask_shape, self.num_attention_heads, dim, dim
/Users/lewtun/git/hf/transformers/src/transformers/models/donut/modeling_donut_swin.py:682: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
height_downsampled, width_downsampled = (height + 1) // 2, (width + 1) // 2
/Users/lewtun/git/hf/transformers/src/transformers/models/donut/modeling_donut_swin.py:266: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
should_pad = (height % 2 == 1) or (width % 2 == 1)
/Users/lewtun/git/hf/transformers/src/transformers/models/donut/modeling_donut_swin.py:267: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if should_pad:
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
Validating ONNX model...
-[β] ONNX model output names match reference model ({'last_hidden_state'})
- Validating ONNX Model output "last_hidden_state":
-[β] (3, 4800, 1024) matches (3, 4800, 1024)
-[x] values not close enough (atol: 0.0001)
Traceback (most recent call last):
File "/Users/lewtun/miniconda3/envs/transformers/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/Users/lewtun/miniconda3/envs/transformers/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/lewtun/git/hf/transformers/src/transformers/onnx/__main__.py", line 180, in <module>
main()
File "/Users/lewtun/git/hf/transformers/src/transformers/onnx/__main__.py", line 107, in main
validate_model_outputs(
File "/Users/lewtun/git/hf/transformers/src/transformers/onnx/convert.py", line 455, in validate_model_outputs
raise ValueError(
ValueError: Outputs values doesn't match between reference model and ONNX exported model: Got max absolute difference of: 0.0091094970703125 for [ -0.6990948 -49.217014 3.7758636 ... 3.2241364 2.7353969
-51.43289 ] vs [ -0.6989002 -49.215897 3.7760048 ... 3.223978 2.7355423
-51.433964 ]
Expected behavior
Donut checkpoints can be exported to ONNX using either a good default value for atol
or changes to the modeling code enable much better agreement between the original / exported models
Issue Analytics
- State:
- Created a year ago
- Comments:9 (3 by maintainers)
Top Results From Across the Web
Export to ONNX - Transformers - Hugging Face
In this guide, we'll show you how to export Transformers models to ONNX (Open Neural Network eXchange). Once exported, a model can be...
Read more >Cannot export PyTorch model to ONNX - Stack Overflow
I am trying to convert a pre-trained torch model to ONNX, but recive the following error:
Read more >Will A Donut Tire Fit Any Car? All Things For Beginner To Expert
You can wonder: 'Will a donut tire fit any car?' The answer is No. You can only replace it with donut tires when...
Read more >Exporting your model to ONNX format | Barracuda
To export a Keras neural network to ONNX you need keras2onnx. These two tutorials provide end-to-end examples: ... Note that the flagΒ ...
Read more >Exporting NeMo Models - NVIDIA Documentation Center
Most of the NeMo models can be exported to ONNX or TorchScript to be deployed for ... For ONNX, you can't have forced...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
python -m transformers.onnx --model=naver-clova-ix/donut-base-finetuned-cord-v2 --feature=vision2seq-lm scratch/onnx --atol 1e-2
with
--atol 1e-2
it works, but value of atol is low.I think it is better to convert the model separately:
And pipeline it together.
cc @mht-sharma would you mind taking a look at this? It might be related to some of the subtleties you noticed with Whisper and passing encoder outputs through the model vs using the getters