Is it possible to convert the onnx model to fp16 model?
See original GitHub issueThe torch example gives parameter revision="fp16"
, can onnx model do the same optimization? Current onnx inference(using CUDAExecutionProvider) is slower than torch version, and used more gpu memory than torch version(12G vs 4G).
Issue Analytics
- State:
- Created a year ago
- Reactions:2
- Comments:19 (6 by maintainers)
Top Results From Across the Web
onnxruntime-tools - PyPI
Transformers Model Optimization Tool of ONNXRuntime. ... Converting GPT-2 model from PyTorch to ONNX is not straightforward when past state is used.
Read more >Convert the TRT model with FP16 - NVIDIA Developer Forums
I was converting the onnx model to TensorRT model. I could successfully convert to TensorRT model by FP32 and do the TensorRT influence....
Read more >Convert a PyTorch Model to ONNX and OpenVINO™ IR
Use Model Optimizer to convert the ONNX model to OpenVINO IR with FP16 precision. The models are saved to the current directory. Add...
Read more >torch.onnx — PyTorch 1.13 documentation
Open Neural Network eXchange (ONNX) is an open standard format for representing machine learning models. The torch.onnx module can export PyTorch models to ......
Read more >Solved: option of mo.py "--data_type FP16 " - Intel Communities
If you want to convert the INT8 ONNX model into IR, just convert without specifying the data_type. The INT8 ONNX model differs from...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Latest script can be downloaded here: https://github.com/tianleiwu/diffusers/blob/tlwu/benchmark/scripts/convert_sd_onnx_to_fp16.py
Example script to convert FP32 to FP16:
To get best performance, please set providers like the following:
See https://onnxruntime.ai/docs/performance/tune-performance.html#convolution-heavy-models-and-the-cuda-ep for more info.
Latency (Seconds per Query) for GPU
First, get the full-precision onnx model locally from the onnx exporter (
convert_stable_diffusion_checkpoint_to_onnx.py
). For example:python convert_stable_diffusion_checkpoint_to_onnx.py --model_path="CompVis/stable-diffusion-v1-4" --output_path="./stable_diffusion_onnx" --height=512 --width=512
Then modify
pipeline_stable_diffusion_onnx.py
to call out toauto_convert_mixed_precision_model_path
with the relevant paths and inputs at the right points in the pipeline. I modified it like so: https://gist.github.com/wareya/0d5d111b1e2448a3e99e8be2b39fbcf1 (I’ve modified this since the last time I ran it, so it might be slightly broken. YMMV. Also, this is extremely hacky and bad and you shouldn’t encourage other people to do it this way or do it this way in any release pipelines, but it was the fastest way for me to get it done locally.)However, on my system, this crashes inside of
auto_mixed_precision_model_path.py
because it tries to delete files that it still has open. Might be a bug in the exact version of the onnx runtime that I’m running (I’m running a nightly version). To work around it, I modify_clean_output_folder
inauto_mixed_precision_model_path.py
like this (this is evil and might only work on Windows 10, not other versions of windows, and will not work on non-windows OSs):