Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ONNX conversion from VisionEncoderDecoderModel with different dimensions

See original GitHub issue

System Info

transformers version: 4.23.0.dev0
Platform: Linux-4.4.0-87-generic-x86_64-with-glibc2.23
Python version: 3.9.13
Huggingface_hub version: 0.10.0
PyTorch version (GPU?): 1.12.1 (True)

Who can help?

@NielsRogge, @patrickvonplaten

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

I am trying to convert a VisionEncoderDecoder model to ONNX using the feature that has been recently merged https://github.com/huggingface/transformers/pull/19254. However, when two pretrained models whose model dimensions are different, It reproduces errors as below.

Model Load & Save

from transformers import VisionEncoderDecoderModel, BertTokenizer, AutoFeatureExtractor

encoder_name_or_path = "hf-internal-testing/tiny-random-vit"
decoder_name_or_path = "fnlp/bart-base-chinese"
model = VisionEncoderDecoderModel.from_encoder_decoder_pretrained(
    encoder_name_or_path,
    decoder_name_or_path,
)
tokenizer = BertTokenizer.from_pretrained(decoder_name_or_path)
feature_extractor = AutoFeatureExtractor.from_pretrained(encoder_name_or_path)

output_dir = "outputs"
model.save_pretrained(output_dir)
feature_extractor.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)

Model Structure

VisionEncoderDecoderModel(
  (encoder): SwinModel(...)
  (decoder): BartForCausalLM(...)
  (enc_to_dec_proj): Linear(in_features=32, out_features=768, bias=True)
)

There exists a new linear layer to project encoder hidden states in modeling_vision_encoder_decoder.py#L217

# encoder outputs might need to be projected to different dimension for decoder
if (
    self.encoder.config.hidden_size != self.decoder.config.hidden_size
    and self.decoder.config.cross_attention_hidden_size is None
):
    self.enc_to_dec_proj = nn.Linear(self.encoder.config.hidden_size, self.decoder.config.hidden_size)

Conversion to ONNX

python -m transformers.onnx --model=outputs/ --feature=vision2seq-lm onnx/ --atol 1e-3

Output:

Traceback (most recent call last):
  File "/home/user/anaconda3/envs/swinocr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/user/anaconda3/envs/swinocr/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/backup2/mkf/transformers/src/transformers/onnx/__main__.py", line 180, in <module>
    main()
  File "/backup2/mkf/transformers/src/transformers/onnx/__main__.py", line 118, in main
    onnx_inputs, onnx_outputs = export(
  File "/backup2/mkf/transformers/src/transformers/onnx/convert.py", line 339, in export
    return export_pytorch(preprocessor, model, config, opset, output, tokenizer=tokenizer, device=device)
  File "/backup2/mkf/transformers/src/transformers/onnx/convert.py", line 192, in export_pytorch
    onnx_export(
  File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/onnx/__init__.py", line 350, in export
    return utils.export(
  File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/onnx/utils.py", line 163, in export
    _export(
  File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/onnx/utils.py", line 1074, in _export
    graph, params_dict, torch_out = _model_to_graph(
  File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/onnx/utils.py", line 727, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
  File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/onnx/utils.py", line 602, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
  File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/onnx/utils.py", line 517, in _trace_and_get_graph_from_model
    trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
  File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/jit/_trace.py", line 1175, in _get_trace_graph
    outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
  File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/jit/_trace.py", line 127, in forward
    graph, out = torch._C._create_graph_by_tracing(
  File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/jit/_trace.py", line 118, in wrapper
    outs.append(self.inner(*trace_inputs))
  File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1118, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/backup2/mkf/transformers/src/transformers/models/bart/modeling_bart.py", line 1851, in forward
    outputs = self.model.decoder(
  File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1118, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/backup2/mkf/transformers/src/transformers/models/bart/modeling_bart.py", line 1104, in forward
    layer_outputs = decoder_layer(
  File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1118, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/backup2/mkf/transformers/src/transformers/models/bart/modeling_bart.py", line 439, in forward
    hidden_states, cross_attn_weights, cross_attn_present_key_value = self.encoder_attn(
  File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1118, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/backup2/mkf/transformers/src/transformers/models/bart/modeling_bart.py", line 201, in forward
    key_states = self._shape(self.k_proj(key_value_states), -1, bsz)
  File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1118, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/home/user/anaconda3/envs/swinocr/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (16x32 and 768x768)

Expected behavior

It seems that the existing ONNX conversion for EncoderDecoderModel only converts the encoder and decoder, and ignores this linear layer. If I change the model to microsoft/trocr-base-handwritten, which has a similar structure and the same dimensions (i.e. no linear layer), the conversion works.

python -m transformers.onnx --model=microsoft/trocr-base-handwritten --feature=vision2seq-lm trocr_onnx/ --atol 1e-3

Thanks a lot for looking into it 😃

Issue Analytics

State:
Created a year ago
Reactions:2
Comments:16 (9 by maintainers)

Top GitHub Comments

2reactions

mht-sharmacommented, Nov 4, 2022

Hi @Fritskee , for ORT inference you’ll have to roll your own generation loop with ONNX Runtime to run the inference. The above code runs decoder with SL 384 with one forward pass which will give you incorrect results.

You can wrap your ORTEncode and ORTDecoder in a ORTModelForVision2Seq


class ORTModelForVision2Seq(VisionEncoderDecoderModel):
    def __init__(self, *args, **kwargs):
        config = AutoConfig.from_pretrained(model_name)
        super().__init__(config)
        self._device = "cpu"

        self.encoder = ORTEncoder()
        self.decoder = ORTDecoder()

    def forward(
        self,
        pixel_values: Optional[torch.FloatTensor] = None,
        decoder_input_ids: Optional[torch.LongTensor] = None,
        encoder_outputs: Optional[Tuple[Tuple[torch.Tensor]]] = None,
        **kwargs,
    ) -> Seq2SeqLMOutput:

        # Encode if needed : first prediction pass
        if encoder_outputs is None:
            encoder_outputs = self.encoder(pixel_values=pixel_values)

        # Decode
        decoder_attention_mask = decoder_input_ids.new_ones(decoder_input_ids.shape)
        decoder_outputs = self.decoder(
            input_ids=decoder_input_ids,
            attention_mask=decoder_attention_mask,
            encoder_hidden_states=encoder_outputs.last_hidden_state,
        )

        return Seq2SeqLMOutput(
            logits=decoder_outputs.logits,
        )

    def prepare_inputs_for_generation(self, input_ids, attention_mask=None, encoder_outputs=None, **kwargs):

        return {
            "decoder_input_ids": input_ids,
            "decoder_atttention_mask": input_ids,
            "encoder_outputs": encoder_outputs,
        }

model = ORTModelForVision2Seq()

start = time.time()
model.config.decoder_start_token_id = 2
model.config.pad_token_id = processor.tokenizer.pad_token_id
model.config.eos_token_id = processor.tokenizer.sep_token_id
model.config.vocab_size = model.config.decoder.vocab_size

generated_ids = model.generate(pixel_values.to(device))
end = time.time()

The class would be soon implemented in the optimum soon for easier inference. Stay tuned!

1reaction

Fritskeecommented, Nov 4, 2022

@mht-sharma Thanks for the example! I tried implementing it. For the further implementation I looked at the optimum/pipelines.py and at optimum/onnxruntime/modeling_seq2seq.py.

Basically I took the examples from modeling_seq2seq.py for the ORTEncoder and ORTDecoder, and I took your example from above and initialize the ORTModelForVision2Seq(VisionEncoderDecoderModel) like this:

class ORTModelForVision2Seq(VisionEncoderDecoderModel):
    def __init__(self, *args, **kwargs):
        config = AutoConfig.from_pretrained('microsoft/trocr-base-str')
        super().__init__(config)
        self._device = "cpu"
        self.encoder = ORTEncoder(onnxruntime.InferenceSession(c.encoder_path, providers=["CPUExecutionProvider"]), device='cpu')
        self.decoder = ORTDecoder(onnxruntime.InferenceSession(c.decoder_path, providers=["CPUExecutionProvider"]), device='cpu')

The encoder_path is the path to the file of encoder.onnx and the path to the decoder file is the path to decoder.onnx.

For your example, the ORTEncoder is initialized like this:

class ORTEncoder:
    """
    Encoder model for ONNX Runtime inference.
    Arguments:
        session (`onnxruntime.InferenceSession`):
            The ONNX Runtime inference session associated to the encoder.
    """

    def __init__(
        self, session: onnxruntime.InferenceSession, device: torch.device, main_input_name: str = "input_ids"
    ):
        self.session = session
        self._device = device
        self.main_input_name = main_input_name
        self.input_names = {input_key.name: idx for idx, input_key in enumerate(self.session.get_inputs())}
        self.output_names = {output_key.name: idx for idx, output_key in enumerate(self.session.get_outputs())}

When I initialize the Onnx InferenceSessions as shown in the first code block of this message, I get the following error: self.encoder = ORTEncoder(onnxruntime.InferenceSession(c.encoder_path, providers=["CPUExecutionProvider"]), device='cpu') File "C:\Users\FrCa\Miniconda3\envs\onnxfix\lib\site-packages\torch\nn\modules\module.py", line 1242, in __setattr__ raise TypeError("cannot assign '{}' as child module '{}' " TypeError: cannot assign '__main__.ORTEncoder' as child module 'encoder' (torch.nn.Module or None expected) python-BaseException

The ORTEncoder seems to expect a path to a Pytorch model for its session, which seems odd. I am currently passing the onnx converted encoder to ORTEncoder, but due to the error, I have also tried passing the equivalent .pth model Additionally, I also tried passing None (which doesn’t make much sense, but it says it is a possibility). Both of them also give errors.

EDIT:

I did find that by not adding the superclass of VisionEncoderDecoderModel, the model can initialize both the ORTEncoder and ORTDecoder. However, this causes the code to break, because the model does need the config attribute to work with the example that is provided here.

class ORTModelForVision2Seq():
    def __init__(self, *args, **kwargs):
        self._device = "cpu"
        self.encoder = ORTEncoder(onnxruntime.InferenceSession(c.encoder_path, providers=["CPUExecutionProvider"]), device='cpu')
        self.decoder = ORTDecoder(onnxruntime.InferenceSession(c.decoder_path, providers=["CPUExecutionProvider"]), device='cpu')

model.config.decoder_start_token_id = 2 AttributeError: 'ORTModelForVision2Seq' object has no attribute 'config'

Top Results From Across the Web

ONNX conversion from VisionEncoderDecoderModel with ...

I am trying to convert a VisionEncoderDecoder model to ONNX using the feature that has been recently merged #19254. However, when two pretrained ......

Exporting transformers models - Hugging Face

Converting an ONNX model using the transformers.onnx package ... a large input size however, the dimensions of the different matrix will be large...

Dimension mismatch during Keras to ONNX conversion (2D ...

It can successfully be saved and loaded again. However, when converting it to an ONNX model, I get different output dimensions. I think...

Write your own converter for your own model - ONNX

ONNX conversion requires two function, one to calculate the shape of the outputs based on the inputs, the other one to do the...

ONNX to TF-Lite Model Conversion

Unfortunately, converting from another framework into the Tensorflow-Lite ... Most frameworks define their kernel tensors to have the following dimensions:.