question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] ONNX optimization fails when optimizing AlbertXXL despite the weights being under 2GB

See original GitHub issue

System Info

Optimum 1.2.3[onnxruntime-gpu], PyTorch 1.12.0a0+bd13bc6, CUDA 11.6, Ubuntu 18.04, Transformers 4.19.0, Onnxruntime nightly build (ort-nightly-gpu 1.12.0.dev20220616003) because otherwise there’s an error: Traceback (most recent call last): File "run_qa.py", line 524, in <module> main() File "run_qa.py", line 311, in main optimizer.export( File "/opt/conda/lib/python3.8/site-packages/optimum/onnxruntime/optimization.py", line 131, in export optimizer = optimize_model( File "/opt/conda/lib/python3.8/site-packages/onnxruntime/transformers/optimizer.py", line 215, in optimize_model temp_model_path = optimize_by_onnxruntime(input, File "/opt/conda/lib/python3.8/site-packages/onnxruntime/transformers/optimizer.py", line 96, in optimize_by_onnxruntime session = onnxruntime.InferenceSession(onnx_model_path, sess_options, **kwargs) File "/opt/conda/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 335, in __init__ self._create_inference_session(providers, provider_options, disabled_optimizers) File "/opt/conda/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 363, in _create_inference_session raise ValueError("This ORT build has {} enabled. ".format(available_providers) + ValueError: This ORT build has ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider'] enabled. Since ORT 1.9, you are required to explicitly set the providers parameter when instantiating InferenceSession. For example, onnxruntime.InferenceSession(..., providers=['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider'], ...)

The Onnxruntime needs a newer version than the most recently released tag because even the newest release (1.11.1) doesn’t yet specify an explicit execution provider, hence we use the nightly build instead.

Who can help?

@JingyaHuang @lewtun @mich

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, …)
  • My own task or dataset (give details below)

Reproduction

The provided run_qa.py script in the examples/onnxruntime/optimization/question-answering doesn’t work as expected with ahotrod/albert_xxlargev1_squad2_512.

To reproduce: python run_qa.py \ --model_name_or_path ahotrod/albert_xxlargev1_squad2_512 \ --dataset_name squad_v2 \ --optimization_level 99 \ --do_eval \ --output_dir /home/ubuntu/albert_xxlargev1_squad2_512_onnx_optimized \ --execution_provider CUDAExecutionProvider \ --optimize_for_gpu

The resulting error: Traceback (most recent call last): File "run_qa.py", line 524, in <module> main() File "run_qa.py", line 311, in main optimizer.export( File "/opt/conda/lib/python3.8/site-packages/optimum/onnxruntime/optimization.py", line 142, in export optimizer.save_model_to_file(onnx_optimized_model_output_path, use_external_data_format) File "/opt/conda/lib/python3.8/site-packages/onnxruntime/transformers/models/gpt2/../../onnx_model.py", line 934, in save_model_to_file save_model(self.model, output_path) File "/opt/conda/lib/python3.8/site-packages/onnx/__init__.py", line 202, in save_model s = _serialize(proto) File "/opt/conda/lib/python3.8/site-packages/onnx/__init__.py", line 71, in _serialize result = proto.SerializeToString() ValueError: Message onnx.ModelProto exceeds maximum protobuf size of 2GB: 3038449703

This error triggers the ModelProto > 2gb error despite the model weights being less than 2gb (~870MB, rather).

Expected behavior

The optimized ONNX model is successfully saved.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
JingyaHuangcommented, Aug 11, 2022

To follow up on the issue, here is the thread in onnx where we continued the discussion.

It turns out that with the hard constraint of protobuf size limit(2GB), ONNX offers some options which export large tensors to external files. Users can tune the parameters to find the best fit.

However, for several extremely large models(the case of AlbertXXL), the structural proto could still exceed 2GB after exporting all tensors to external files. In this case, a workaround would be to either load the fp16 model weights(if the model was also trained with mixed precision)

>>> from transformers import AutoModel
>>> model = AutoModel.from_pretrained('albert-xxlarge-v1', torch_dtype=torch.float16)

or use ORTQuantizer to proceed with the quantization.

A PR is in progress to improve the compatibility of ORTOptimizer and ORTQuantizer in cases of large ONNX proto.

0reactions
JingyaHuangcommented, Aug 3, 2022

BTW, if you are not using a vision model, setting optimization_level=2 is generally good enough.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Graph optimizations - ONNX Runtime
ONNX Runtime provides various graph optimizations to improve performance. ... Below we provide details on the optimization levels, the online/offline mode, ...
Read more >
Optimize Albert HuggingFace model - python - Stack Overflow
Optimise any PyTorch model, using torch_optimizer. Installation: pip install torch_optimizer. Implementation:
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found