Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] ONNX optimization fails when optimizing AlbertXXL despite the weights being under 2GB

See original GitHub issue

System Info

Optimum 1.2.3[onnxruntime-gpu], PyTorch 1.12.0a0+bd13bc6, CUDA 11.6, Ubuntu 18.04, Transformers 4.19.0, Onnxruntime nightly build (ort-nightly-gpu 1.12.0.dev20220616003) because otherwise there’s an error: Traceback (most recent call last): File "run_qa.py", line 524, in <module> main() File "run_qa.py", line 311, in main optimizer.export( File "/opt/conda/lib/python3.8/site-packages/optimum/onnxruntime/optimization.py", line 131, in export optimizer = optimize_model( File "/opt/conda/lib/python3.8/site-packages/onnxruntime/transformers/optimizer.py", line 215, in optimize_model temp_model_path = optimize_by_onnxruntime(input, File "/opt/conda/lib/python3.8/site-packages/onnxruntime/transformers/optimizer.py", line 96, in optimize_by_onnxruntime session = onnxruntime.InferenceSession(onnx_model_path, sess_options, **kwargs) File "/opt/conda/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 335, in __init__ self._create_inference_session(providers, provider_options, disabled_optimizers) File "/opt/conda/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 363, in _create_inference_session raise ValueError("This ORT build has {} enabled. ".format(available_providers) + ValueError: This ORT build has ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider'] enabled. Since ORT 1.9, you are required to explicitly set the providers parameter when instantiating InferenceSession. For example, onnxruntime.InferenceSession(..., providers=['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider'], ...)

The Onnxruntime needs a newer version than the most recently released tag because even the newest release (1.11.1) doesn’t yet specify an explicit execution provider, hence we use the nightly build instead.

Who can help?

@JingyaHuang @lewtun @mich

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

The provided run_qa.py script in the examples/onnxruntime/optimization/question-answering doesn’t work as expected with ahotrod/albert_xxlargev1_squad2_512.

To reproduce: python run_qa.py \ --model_name_or_path ahotrod/albert_xxlargev1_squad2_512 \ --dataset_name squad_v2 \ --optimization_level 99 \ --do_eval \ --output_dir /home/ubuntu/albert_xxlargev1_squad2_512_onnx_optimized \ --execution_provider CUDAExecutionProvider \ --optimize_for_gpu

The resulting error: Traceback (most recent call last): File "run_qa.py", line 524, in <module> main() File "run_qa.py", line 311, in main optimizer.export( File "/opt/conda/lib/python3.8/site-packages/optimum/onnxruntime/optimization.py", line 142, in export optimizer.save_model_to_file(onnx_optimized_model_output_path, use_external_data_format) File "/opt/conda/lib/python3.8/site-packages/onnxruntime/transformers/models/gpt2/../../onnx_model.py", line 934, in save_model_to_file save_model(self.model, output_path) File "/opt/conda/lib/python3.8/site-packages/onnx/__init__.py", line 202, in save_model s = _serialize(proto) File "/opt/conda/lib/python3.8/site-packages/onnx/__init__.py", line 71, in _serialize result = proto.SerializeToString() ValueError: Message onnx.ModelProto exceeds maximum protobuf size of 2GB: 3038449703

This error triggers the ModelProto > 2gb error despite the model weights being less than 2gb (~870MB, rather).

Expected behavior

The optimized ONNX model is successfully saved.

Issue Analytics

State:
Created a year ago
Comments:7 (5 by maintainers)

Top GitHub Comments

1reaction

JingyaHuangcommented, Aug 11, 2022

To follow up on the issue, here is the thread in onnx where we continued the discussion.

It turns out that with the hard constraint of protobuf size limit(2GB), ONNX offers some options which export large tensors to external files. Users can tune the parameters to find the best fit.

However, for several extremely large models(the case of AlbertXXL), the structural proto could still exceed 2GB after exporting all tensors to external files. In this case, a workaround would be to either load the fp16 model weights(if the model was also trained with mixed precision)

>>> from transformers import AutoModel
>>> model = AutoModel.from_pretrained('albert-xxlarge-v1', torch_dtype=torch.float16)

or use ORTQuantizer to proceed with the quantization.

A PR is in progress to improve the compatibility of ORTOptimizer and ORTQuantizer in cases of large ONNX proto.

0reactions

JingyaHuangcommented, Aug 3, 2022

BTW, if you are not using a vision model, setting optimization_level=2 is generally good enough.

Top Results From Across the Web

Graph optimizations - ONNX Runtime

ONNX Runtime provides various graph optimizations to improve performance. ... Below we provide details on the optimization levels, the online/offline mode, ...