Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add optimization and quantization options to `optimum.exporters.onnx`

See original GitHub issue

Feature request

Would be nice to have two more arguments in optimum.exporters.onnx in order to have the optimized and quantized version of the exported models along side with the “normal” ones. I can imagine something like:

python -m optimum.exporters.onnx --model <model-name> -OX -quantized-arch <arch> output

Where:

-OX corresponds to the already available O1, O2, O3 and O4 optimization possibilities.
-quantized-arch can take values such as arm64, avx2, avx512, avx512_vnni and tensorrt

Motivation

This will allow to very easily create optimized/quantized version of the models we need.

Your contribution

I might help on submiting a PR for it, but I’m not able to give a “when” for now.

Issue Analytics

State:
Created 9 months ago
Reactions:2
Comments:8 (4 by maintainers)

Top GitHub Comments

2reactions

fxmartycommented, Dec 12, 2022

Hey @jplu I was thinking of creating the backbone of a optimum-cli for now only supporting the onnx export (e.g. by simply mapping to python -m optimum.exporters.onnx).

Once this is done, would you be interested in adding for example optimum-cli --onnxruntime --optimize? We can discuss more the design if you want 😃

2reactions

fxmartycommented, Dec 9, 2022

Yes I think it’s a neat idea! Maybe it would be better to have a optimum-cli (https://github.com/huggingface/optimum/issues/188) in the same fashion as transformers, and have commands as:

optimum-cli export --onnx --model <model-name> onnx_output/

Dynamic quantization:

optimum-cli quantize --onnxruntime --arch avx2 --path onnx_output/

ORT optimizations:

optimum-cli optimize --onnxruntime -O2 --path onnx_output/

And we could actually support the same for OpenVINO, Intel Neural Compressor, etc.

Top Results From Across the Web

Export a model to ONNX with optimum.exporters.onnx

Once exported, a model can be optimized for inference via techniques such as graph optimization and quantization. Check the optimum.onnxruntime subpackage ...

Issues · huggingface/optimum - GitHub

Issues list ; Add support for Speech Encoder Decoder models in optimum.exporters.onnx. #625 opened in 16 hours ; Quantization failed for transformers m2m100...

Video: Accelerate Transformer inference with Optimum and ...

I start from a DistilBERT model fine-tuned for text classification, export it to ONNX format, then optimize it, and finally quantize it.

Compare Methods for Converting and Optimizing ... - Wandb

In this report, we're going to walk through how to convert trained PyTorch and Keras models to slimmer, leaner models for deployment.

ONNX Runtime Performance Tuning

ONNX Runtime provides high performance across a range of hardware options through its ... To profile CUDA kernels, please add cupti library to...