Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Export LayoutLMv2 to onnx

See original GitHub issue

I am trying to export LayoutLMv2 model to onnx but there is no support for that available in transformers library. I have tried to follow the method available for layoutLM but that is not working. Here is config class for LayoutLMv2

class LayoutLMv2OnnxConfig(OnnxConfig):
    def __init__(
        self,
        config: PretrainedConfig,
        task: str = "default",
        patching_specs: List[PatchingSpec] = None,
    ):
        super().__init__(config, task=task, patching_specs=patching_specs)
        self.max_2d_positions = config.max_2d_position_embeddings - 1

    @property
    def inputs(self) -> Mapping[str, Mapping[int, str]]:
        return OrderedDict(
            [
                ("input_ids", {0: "batch", 1: "sequence"}),
                ("bbox", {0: "batch", 1: "sequence"}),
                ("image", {0: "batch", 1: "sequence"}),
                ("attention_mask", {0: "batch", 1: "sequence"}),
                ("token_type_ids", {0: "batch", 1: "sequence"}),
            ]
        )

    def generate_dummy_inputs(
        self,
        tokenizer: PreTrainedTokenizer,
        batch_size: int = -1,
        seq_length: int = -1,
        is_pair: bool = False,
        framework: Optional[TensorType] = None,
    ) -> Mapping[str, Any]:
        """
        Generate inputs to provide to the ONNX exporter for the specific framework
        Args:
            tokenizer: The tokenizer associated with this model configuration
            batch_size: The batch size (int) to export the model for (-1 means dynamic axis)
            seq_length: The sequence length (int) to export the model for (-1 means dynamic axis)
            is_pair: Indicate if the input is a pair (sentence 1, sentence 2)
            framework: The framework (optional) the tokenizer will generate tensor for
        Returns:
            Mapping[str, Tensor] holding the kwargs to provide to the model's forward function
        """

        input_dict = super().generate_dummy_inputs(tokenizer, batch_size, seq_length, is_pair, framework)

        # Generate a dummy bbox
        box = [48, 84, 73, 128]

        if not framework == TensorType.PYTORCH:
            raise NotImplementedError("Exporting LayoutLM to ONNX is currently only supported for PyTorch.")

        if not is_torch_available():
            raise ValueError("Cannot generate dummy inputs without PyTorch installed.")
        import torch

        batch_size, seq_length = input_dict["input_ids"].shape
        input_dict["bbox"] = torch.tensor([*[box] * seq_length]).tile(batch_size, 1, 1)
        return input_dict

onnx_config = LayoutLMv2OnnxConfig(model.config)


export(tokenizer=tokenizer, model=model, config=onnx_config, opset=12, output=Path('onnx/layoutlmv2.onnx'))

Running the export line is raising this error,

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-25-99a1f167e396> in <module>()
----> 1 export(tokenizer=tokenizer, model=model, config=onnx_config, opset=12, output=Path('onnx/layoutlmv2.onnx'))

3 frames
/usr/local/lib/python3.7/dist-packages/transformers/models/layoutlmv2/tokenization_layoutlmv2.py in __call__(self, text, text_pair, boxes, word_labels, add_special_tokens, padding, truncation, max_length, stride, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose, **kwargs)
    449 
    450         words = text if text_pair is None else text_pair
--> 451         assert boxes is not None, "You must provide corresponding bounding boxes"
    452         if is_batched:
    453             assert len(words) == len(boxes), "You must provide words and boxes for an equal amount of examples"

AssertionError: You must provide corresponding bounding boxes

Issue Analytics

State:
Created 2 years ago
Reactions:8
Comments:22 (5 by maintainers)

Top GitHub Comments

5reactions

michaelbenayouncommented, Nov 15, 2021

It seems to come from the LayoutLMv2Tokenizer which takes boxes (bbox) as inputs. Here you are calling super().generate_dummy_inputs which uses the tokenizer to create dummy inputs, but this does not provide the boxes to the tokenizer, hence the error.

There are two ways of solving this issue:

Make this supported in the base class, that could somehow take other keyword arguments for these kind of cases.
Not using the super method, and implementing everything in the LayoutLMv2 OnnxConfig

1reaction

fadi212commented, Nov 29, 2021

Hi @viantirreau @lalitr994 , You can take a look at this PR and convert your model with this branch. https://github.com/huggingface/transformers/pull/14555

Top Results From Across the Web

Export to ONNX - Transformers - Hugging Face

In this guide, we'll show you how to export Transformers models to ONNX (Open Neural Network eXchange). Once exported, a model can be...

Tutorial 6: Exporting a model to ONNX

We provide a python script to export the pytorch model trained by MMAction2 to ONNX. python tools/deployment/pytorch2onnx.py ${CONFIG_FILE} ${CHECKPOINT_FILE} ...

Error importing LayoutLMv2ForTokenClassification from ...

... -packages\transformers\models\layoutlmv2\modeling_layoutlmv2.py in ... Pytorch to ONNX export function fails and causes legacy function ...

Convert Transformers to ONNX with Hugging Face Optimum

If you deploy Transformers models in production environments, we recommend exporting them first into a serialized format that can be loaded, ...

Contribute to huggingface/transformers · GitHub

Export LayoutLMv2 to onnx Good First Issue. #14368 opened on Nov 11, 2021 by fadi212. 22. LayoutLMv2 model not supporting training on more...