ONNX conversion from VisionEncoderDecoderModel with different dimensions
See original GitHub issueSystem Info
transformers
version: 4.23.0.dev0- Platform: Linux-4.4.0-87-generic-x86_64-with-glibc2.23
- Python version: 3.9.13
- Huggingface_hub version: 0.10.0
- PyTorch version (GPU?): 1.12.1 (True)
Who can help?
@NielsRogge, @patrickvonplaten
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, …) - My own task or dataset (give details below)
Reproduction
I am trying to convert a VisionEncoderDecoder model to ONNX using the feature that has been recently merged https://github.com/huggingface/transformers/pull/19254. However, when two pretrained models whose model dimensions are different, It reproduces errors as below.
Model Load & Save
from transformers import VisionEncoderDecoderModel, BertTokenizer, AutoFeatureExtractor
encoder_name_or_path = "hf-internal-testing/tiny-random-vit"
decoder_name_or_path = "fnlp/bart-base-chinese"
model = VisionEncoderDecoderModel.from_encoder_decoder_pretrained(
encoder_name_or_path,
decoder_name_or_path,
)
tokenizer = BertTokenizer.from_pretrained(decoder_name_or_path)
feature_extractor = AutoFeatureExtractor.from_pretrained(encoder_name_or_path)
output_dir = "outputs"
model.save_pretrained(output_dir)
feature_extractor.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)
Model Structure
VisionEncoderDecoderModel(
(encoder): SwinModel(...)
(decoder): BartForCausalLM(...)
(enc_to_dec_proj): Linear(in_features=32, out_features=768, bias=True)
)
There exists a new linear layer to project encoder hidden states in modeling_vision_encoder_decoder.py#L217
# encoder outputs might need to be projected to different dimension for decoder
if (
self.encoder.config.hidden_size != self.decoder.config.hidden_size
and self.decoder.config.cross_attention_hidden_size is None
):
self.enc_to_dec_proj = nn.Linear(self.encoder.config.hidden_size, self.decoder.config.hidden_size)
Conversion to ONNX
python -m transformers.onnx --model=outputs/ --feature=vision2seq-lm onnx/ --atol 1e-3
Output:
Traceback (most recent call last):
File "/home/user/anaconda3/envs/swinocr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/user/anaconda3/envs/swinocr/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/backup2/mkf/transformers/src/transformers/onnx/__main__.py", line 180, in <module>
main()
File "/backup2/mkf/transformers/src/transformers/onnx/__main__.py", line 118, in main
onnx_inputs, onnx_outputs = export(
File "/backup2/mkf/transformers/src/transformers/onnx/convert.py", line 339, in export
return export_pytorch(preprocessor, model, config, opset, output, tokenizer=tokenizer, device=device)
File "/backup2/mkf/transformers/src/transformers/onnx/convert.py", line 192, in export_pytorch
onnx_export(
File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/onnx/__init__.py", line 350, in export
return utils.export(
File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/onnx/utils.py", line 163, in export
_export(
File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/onnx/utils.py", line 1074, in _export
graph, params_dict, torch_out = _model_to_graph(
File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/onnx/utils.py", line 727, in _model_to_graph
graph, params, torch_out, module = _create_jit_graph(model, args)
File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/onnx/utils.py", line 602, in _create_jit_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args)
File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/onnx/utils.py", line 517, in _trace_and_get_graph_from_model
trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/jit/_trace.py", line 1175, in _get_trace_graph
outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/jit/_trace.py", line 127, in forward
graph, out = torch._C._create_graph_by_tracing(
File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/jit/_trace.py", line 118, in wrapper
outs.append(self.inner(*trace_inputs))
File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1118, in _slow_forward
result = self.forward(*input, **kwargs)
File "/backup2/mkf/transformers/src/transformers/models/bart/modeling_bart.py", line 1851, in forward
outputs = self.model.decoder(
File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1118, in _slow_forward
result = self.forward(*input, **kwargs)
File "/backup2/mkf/transformers/src/transformers/models/bart/modeling_bart.py", line 1104, in forward
layer_outputs = decoder_layer(
File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1118, in _slow_forward
result = self.forward(*input, **kwargs)
File "/backup2/mkf/transformers/src/transformers/models/bart/modeling_bart.py", line 439, in forward
hidden_states, cross_attn_weights, cross_attn_present_key_value = self.encoder_attn(
File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1118, in _slow_forward
result = self.forward(*input, **kwargs)
File "/backup2/mkf/transformers/src/transformers/models/bart/modeling_bart.py", line 201, in forward
key_states = self._shape(self.k_proj(key_value_states), -1, bsz)
File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1118, in _slow_forward
result = self.forward(*input, **kwargs)
File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (16x32 and 768x768)
Expected behavior
It seems that the existing ONNX conversion for EncoderDecoderModel only converts the encoder and decoder, and ignores this linear layer. If I change the model to microsoft/trocr-base-handwritten, which has a similar structure and the same dimensions (i.e. no linear layer), the conversion works.
python -m transformers.onnx --model=microsoft/trocr-base-handwritten --feature=vision2seq-lm trocr_onnx/ --atol 1e-3
Thanks a lot for looking into it 😃
Issue Analytics
- State:
- Created a year ago
- Reactions:2
- Comments:16 (9 by maintainers)
Top Results From Across the Web
ONNX conversion from VisionEncoderDecoderModel with ...
I am trying to convert a VisionEncoderDecoder model to ONNX using the feature that has been recently merged #19254. However, when two pretrained ......
Read more >Exporting transformers models - Hugging Face
Converting an ONNX model using the transformers.onnx package ... a large input size however, the dimensions of the different matrix will be large...
Read more >Dimension mismatch during Keras to ONNX conversion (2D ...
It can successfully be saved and loaded again. However, when converting it to an ONNX model, I get different output dimensions. I think...
Read more >Write your own converter for your own model - ONNX
ONNX conversion requires two function, one to calculate the shape of the outputs based on the inputs, the other one to do the...
Read more >ONNX to TF-Lite Model Conversion
Unfortunately, converting from another framework into the Tensorflow-Lite ... Most frameworks define their kernel tensors to have the following dimensions:.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @Fritskee , for ORT inference you’ll have to roll your own generation loop with ONNX Runtime to run the inference. The above code runs decoder with SL 384 with one forward pass which will give you incorrect results.
You can wrap your ORTEncode and ORTDecoder in a ORTModelForVision2Seq
The class would be soon implemented in the
optimum
soon for easier inference. Stay tuned!@mht-sharma Thanks for the example! I tried implementing it. For the further implementation I looked at the optimum/pipelines.py and at optimum/onnxruntime/modeling_seq2seq.py.
Basically I took the examples from
modeling_seq2seq.py
for theORTEncoder
andORTDecoder
, and I took your example from above and initialize theORTModelForVision2Seq(VisionEncoderDecoderModel)
like this:The encoder_path is the path to the file of
encoder.onnx
and the path to the decoder file is the path todecoder.onnx
.For your example, the ORTEncoder is initialized like this:
When I initialize the Onnx InferenceSessions as shown in the first code block of this message, I get the following error:
self.encoder = ORTEncoder(onnxruntime.InferenceSession(c.encoder_path, providers=["CPUExecutionProvider"]), device='cpu') File "C:\Users\FrCa\Miniconda3\envs\onnxfix\lib\site-packages\torch\nn\modules\module.py", line 1242, in __setattr__ raise TypeError("cannot assign '{}' as child module '{}' " TypeError: cannot assign '__main__.ORTEncoder' as child module 'encoder' (torch.nn.Module or None expected) python-BaseException
The
ORTEncoder
seems to expect a path to a Pytorch model for its session, which seems odd. I am currently passing the onnx converted encoder toORTEncoder
, but due to the error, I have also tried passing the equivalent.pth
model Additionally, I also tried passing None (which doesn’t make much sense, but it says it is a possibility). Both of them also give errors.EDIT:
I did find that by not adding the superclass of
VisionEncoderDecoderModel
, the model can initialize both the ORTEncoder and ORTDecoder. However, this causes the code to break, because the model does need theconfig
attribute to work with the example that is provided here.model.config.decoder_start_token_id = 2 AttributeError: 'ORTModelForVision2Seq' object has no attribute 'config'