Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Is it possible to convert the onnx model to fp16 model?

See original GitHub issue

The torch example gives parameter revision="fp16", can onnx model do the same optimization? Current onnx inference(using CUDAExecutionProvider) is slower than torch version, and used more gpu memory than torch version(12G vs 4G).

Issue Analytics

State:
Created a year ago
Reactions:2
Comments:19 (6 by maintainers)

Top GitHub Comments

6reactions

tianleiwucommented, Dec 3, 2022

Latest script can be downloaded here: https://github.com/tianleiwu/diffusers/blob/tlwu/benchmark/scripts/convert_sd_onnx_to_fp16.py

Example script to convert FP32 to FP16:

 # You can clone the source code of onnxruntime to run this script as the following:
 #    git clone https://github.com/microsoft/onnxruntime
 #    cd onnxruntime/onnxruntime/python/tools/transformers 
 #    save this script to the directory as sd_fp16.py. Modify the root_dir if needed.
 #    python sd_fp16.py
    
import os
import shutil
import onnx
from onnxruntime.transformers.optimizer import optimize_model

# root directory of the onnx pipeline data files
root_dir = "./sd_onnx"

for name in ["unet", "vae_encoder", "vae_decoder", "text_encoder", "safety_checker"]:
    onnx_model_path = f"{root_dir}/{name}/model.onnx"

    # The following will fuse LayerNormalization and Gelu. Do it before fp16 conversion, otherwise they cannot be fused later.
    # Right now, onnxruntime does not save >2GB model so we use script to optimize unet instead.
    m = optimize_model(
        onnx_model_path,
        model_type="bert",
        num_heads=0,
        hidden_size=0,
        opt_level=0,
        optimization_options=None,
        use_gpu=False,
    )

    # Use op_bloack_list to force some operators to compute in FP32.
    # TODO: might need some tuning to add more operators to op_bloack_list to reduce accuracy loss.
    if name == "safety_checker":
        m.convert_float_to_float16(op_block_list=["Where"])
    else:
        m.convert_float_to_float16(op_block_list=["RandomNormalLike"])

    # Overwrite existing models. You can change it to another directory but need copy other files like tokenizer manually.
    optimized_model_path = f"{root_dir}/{name}/model.onnx"
    output_dir = os.path.dirname(optimized_model_path)
    shutil.rmtree(output_dir)
    os.mkdir(output_dir)

    onnx.save_model(m.model, optimized_model_path)

To get best performance, please set providers like the following:

   providers = [("CUDAExecutionProvider", {"cudnn_conv_use_max_workspace": '1'})]

See https://onnxruntime.ai/docs/performance/tune-performance.html#convolution-heavy-models-and-the-cuda-ep for more info.

Latency (Seconds per Query) for GPU

Stable Diffusion Pipeline (text to 512x512 image)	T4	V100	A100
PyTorch FP16	12.8	5.1	3.1
Onnx FP32	26.2	8.3	4.9
Onnx FP16	9.6	3.8	2.4

2reactions

wareyacommented, Sep 16, 2022

First, get the full-precision onnx model locally from the onnx exporter (convert_stable_diffusion_checkpoint_to_onnx.py). For example:

python convert_stable_diffusion_checkpoint_to_onnx.py --model_path="CompVis/stable-diffusion-v1-4" --output_path="./stable_diffusion_onnx" --height=512 --width=512

Then modify pipeline_stable_diffusion_onnx.py to call out to auto_convert_mixed_precision_model_path with the relevant paths and inputs at the right points in the pipeline. I modified it like so: https://gist.github.com/wareya/0d5d111b1e2448a3e99e8be2b39fbcf1 (I’ve modified this since the last time I ran it, so it might be slightly broken. YMMV. Also, this is extremely hacky and bad and you shouldn’t encourage other people to do it this way or do it this way in any release pipelines, but it was the fastest way for me to get it done locally.)

However, on my system, this crashes inside of auto_mixed_precision_model_path.py because it tries to delete files that it still has open. Might be a bug in the exact version of the onnx runtime that I’m running (I’m running a nightly version). To work around it, I modify _clean_output_folder in auto_mixed_precision_model_path.py like this (this is evil and might only work on Windows 10, not other versions of windows, and will not work on non-windows OSs):

    if os.path.exists(tmp_tensor_path):
        try:
            os.remove(tmp_tensor_path)
        except:
            try:
                tmp_tensor_path = tmp_tensor_path.replace("/", "\\")
                print(f"force deleting {tmp_tensor_path}")
                os.system(f'cmd /c "del /f {tmp_tensor_path}"')
            except:
                print("no idea what broke here but something did!")

Top Results From Across the Web

onnxruntime-tools - PyPI

Transformers Model Optimization Tool of ONNXRuntime. ... Converting GPT-2 model from PyTorch to ONNX is not straightforward when past state is used.

Convert the TRT model with FP16 - NVIDIA Developer Forums

I was converting the onnx model to TensorRT model. I could successfully convert to TensorRT model by FP32 and do the TensorRT influence....

Convert a PyTorch Model to ONNX and OpenVINO™ IR

Use Model Optimizer to convert the ONNX model to OpenVINO IR with FP16 precision. The models are saved to the current directory. Add...

torch.onnx — PyTorch 1.13 documentation

Open Neural Network eXchange (ONNX) is an open standard format for representing machine learning models. The torch.onnx module can export PyTorch models to ......

Solved: option of mo.py "--data_type FP16 " - Intel Communities

If you want to convert the INT8 ONNX model into IR, just convert without specifying the data_type. The INT8 ONNX model differs from...