question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Is it possible to convert the onnx model to fp16 model?

See original GitHub issue

The torch example gives parameter revision="fp16", can onnx model do the same optimization? Current onnx inference(using CUDAExecutionProvider) is slower than torch version, and used more gpu memory than torch version(12G vs 4G).

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:2
  • Comments:19 (6 by maintainers)

github_iconTop GitHub Comments

6reactions
tianleiwucommented, Dec 3, 2022

Latest script can be downloaded here: https://github.com/tianleiwu/diffusers/blob/tlwu/benchmark/scripts/convert_sd_onnx_to_fp16.py

Example script to convert FP32 to FP16:

 # You can clone the source code of onnxruntime to run this script as the following:
 #    git clone https://github.com/microsoft/onnxruntime
 #    cd onnxruntime/onnxruntime/python/tools/transformers 
 #    save this script to the directory as sd_fp16.py. Modify the root_dir if needed.
 #    python sd_fp16.py
    
import os
import shutil
import onnx
from onnxruntime.transformers.optimizer import optimize_model

# root directory of the onnx pipeline data files
root_dir = "./sd_onnx"

for name in ["unet", "vae_encoder", "vae_decoder", "text_encoder", "safety_checker"]:
    onnx_model_path = f"{root_dir}/{name}/model.onnx"

    # The following will fuse LayerNormalization and Gelu. Do it before fp16 conversion, otherwise they cannot be fused later.
    # Right now, onnxruntime does not save >2GB model so we use script to optimize unet instead.
    m = optimize_model(
        onnx_model_path,
        model_type="bert",
        num_heads=0,
        hidden_size=0,
        opt_level=0,
        optimization_options=None,
        use_gpu=False,
    )

    # Use op_bloack_list to force some operators to compute in FP32.
    # TODO: might need some tuning to add more operators to op_bloack_list to reduce accuracy loss.
    if name == "safety_checker":
        m.convert_float_to_float16(op_block_list=["Where"])
    else:
        m.convert_float_to_float16(op_block_list=["RandomNormalLike"])

    # Overwrite existing models. You can change it to another directory but need copy other files like tokenizer manually.
    optimized_model_path = f"{root_dir}/{name}/model.onnx"
    output_dir = os.path.dirname(optimized_model_path)
    shutil.rmtree(output_dir)
    os.mkdir(output_dir)

    onnx.save_model(m.model, optimized_model_path)

To get best performance, please set providers like the following:

   providers = [("CUDAExecutionProvider", {"cudnn_conv_use_max_workspace": '1'})]

See https://onnxruntime.ai/docs/performance/tune-performance.html#convolution-heavy-models-and-the-cuda-ep for more info.

Latency (Seconds per Query) for GPU

Stable Diffusion Pipeline (text to 512x512 image) T4 V100 A100
PyTorch FP16 12.8 5.1 3.1
Onnx FP32 26.2 8.3 4.9
Onnx FP16 9.6 3.8 2.4
2reactions
wareyacommented, Sep 16, 2022

First, get the full-precision onnx model locally from the onnx exporter (convert_stable_diffusion_checkpoint_to_onnx.py). For example:

python convert_stable_diffusion_checkpoint_to_onnx.py --model_path="CompVis/stable-diffusion-v1-4" --output_path="./stable_diffusion_onnx" --height=512 --width=512

Then modify pipeline_stable_diffusion_onnx.py to call out to auto_convert_mixed_precision_model_path with the relevant paths and inputs at the right points in the pipeline. I modified it like so: https://gist.github.com/wareya/0d5d111b1e2448a3e99e8be2b39fbcf1 (I’ve modified this since the last time I ran it, so it might be slightly broken. YMMV. Also, this is extremely hacky and bad and you shouldn’t encourage other people to do it this way or do it this way in any release pipelines, but it was the fastest way for me to get it done locally.)

However, on my system, this crashes inside of auto_mixed_precision_model_path.py because it tries to delete files that it still has open. Might be a bug in the exact version of the onnx runtime that I’m running (I’m running a nightly version). To work around it, I modify _clean_output_folder in auto_mixed_precision_model_path.py like this (this is evil and might only work on Windows 10, not other versions of windows, and will not work on non-windows OSs):

    if os.path.exists(tmp_tensor_path):
        try:
            os.remove(tmp_tensor_path)
        except:
            try:
                tmp_tensor_path = tmp_tensor_path.replace("/", "\\")
                print(f"force deleting {tmp_tensor_path}")
                os.system(f'cmd /c "del /f {tmp_tensor_path}"')
            except:
                print("no idea what broke here but something did!")
Read more comments on GitHub >

github_iconTop Results From Across the Web

onnxruntime-tools - PyPI
Transformers Model Optimization Tool of ONNXRuntime. ... Converting GPT-2 model from PyTorch to ONNX is not straightforward when past state is used.
Read more >
Convert the TRT model with FP16 - NVIDIA Developer Forums
I was converting the onnx model to TensorRT model. I could successfully convert to TensorRT model by FP32 and do the TensorRT influence....
Read more >
Convert a PyTorch Model to ONNX and OpenVINO™ IR
Use Model Optimizer to convert the ONNX model to OpenVINO IR with FP16 precision. The models are saved to the current directory. Add...
Read more >
torch.onnx — PyTorch 1.13 documentation
Open Neural Network eXchange (ONNX) is an open standard format for representing machine learning models. The torch.onnx module can export PyTorch models to ......
Read more >
Solved: option of mo.py "--data_type FP16 " - Intel Communities
If you want to convert the INT8 ONNX model into IR, just convert without specifying the data_type. The INT8 ONNX model differs from...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found