[BUG] ONNX optimization fails when optimizing AlbertXXL despite the weights being under 2GB
See original GitHub issueSystem Info
Optimum 1.2.3[onnxruntime-gpu],
PyTorch 1.12.0a0+bd13bc6,
CUDA 11.6, Ubuntu 18.04,
Transformers 4.19.0,
Onnxruntime nightly build (ort-nightly-gpu 1.12.0.dev20220616003) because otherwise there’s an error: Traceback (most recent call last): File "run_qa.py", line 524, in <module> main() File "run_qa.py", line 311, in main optimizer.export( File "/opt/conda/lib/python3.8/site-packages/optimum/onnxruntime/optimization.py", line 131, in export optimizer = optimize_model( File "/opt/conda/lib/python3.8/site-packages/onnxruntime/transformers/optimizer.py", line 215, in optimize_model temp_model_path = optimize_by_onnxruntime(input, File "/opt/conda/lib/python3.8/site-packages/onnxruntime/transformers/optimizer.py", line 96, in optimize_by_onnxruntime session = onnxruntime.InferenceSession(onnx_model_path, sess_options, **kwargs) File "/opt/conda/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 335, in __init__ self._create_inference_session(providers, provider_options, disabled_optimizers) File "/opt/conda/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 363, in _create_inference_session raise ValueError("This ORT build has {} enabled. ".format(available_providers) + ValueError: This ORT build has ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider'] enabled. Since ORT 1.9, you are required to explicitly set the providers parameter when instantiating InferenceSession. For example, onnxruntime.InferenceSession(..., providers=['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider'], ...)
The Onnxruntime needs a newer version than the most recently released tag because even the newest release (1.11.1) doesn’t yet specify an explicit execution provider, hence we use the nightly build instead.
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, …) - My own task or dataset (give details below)
Reproduction
The provided run_qa.py script in the examples/onnxruntime/optimization/question-answering
doesn’t work as expected with ahotrod/albert_xxlargev1_squad2_512.
To reproduce:
python run_qa.py \ --model_name_or_path ahotrod/albert_xxlargev1_squad2_512 \ --dataset_name squad_v2 \ --optimization_level 99 \ --do_eval \ --output_dir /home/ubuntu/albert_xxlargev1_squad2_512_onnx_optimized \ --execution_provider CUDAExecutionProvider \ --optimize_for_gpu
The resulting error:
Traceback (most recent call last): File "run_qa.py", line 524, in <module> main() File "run_qa.py", line 311, in main optimizer.export( File "/opt/conda/lib/python3.8/site-packages/optimum/onnxruntime/optimization.py", line 142, in export optimizer.save_model_to_file(onnx_optimized_model_output_path, use_external_data_format) File "/opt/conda/lib/python3.8/site-packages/onnxruntime/transformers/models/gpt2/../../onnx_model.py", line 934, in save_model_to_file save_model(self.model, output_path) File "/opt/conda/lib/python3.8/site-packages/onnx/__init__.py", line 202, in save_model s = _serialize(proto) File "/opt/conda/lib/python3.8/site-packages/onnx/__init__.py", line 71, in _serialize result = proto.SerializeToString() ValueError: Message onnx.ModelProto exceeds maximum protobuf size of 2GB: 3038449703
This error triggers the ModelProto > 2gb error despite the model weights being less than 2gb (~870MB, rather).
Expected behavior
The optimized ONNX model is successfully saved.
Issue Analytics
- State:
- Created a year ago
- Comments:7 (5 by maintainers)
Top GitHub Comments
To follow up on the issue, here is the thread in onnx where we continued the discussion.
It turns out that with the hard constraint of protobuf size limit(2GB), ONNX offers some options which export large tensors to external files. Users can tune the parameters to find the best fit.
However, for several extremely large models(the case of
AlbertXXL
), the structural proto could still exceed 2GB after exporting all tensors to external files. In this case, a workaround would be to either load the fp16 model weights(if the model was also trained with mixed precision)or use
ORTQuantizer
to proceed with the quantization.A PR is in progress to improve the compatibility of
ORTOptimizer
andORTQuantizer
in cases of large ONNX proto.BTW, if you are not using a vision model, setting
optimization_level=2
is generally good enough.