question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add optimization and quantization options to `optimum.exporters.onnx`

See original GitHub issue

Feature request

Would be nice to have two more arguments in optimum.exporters.onnx in order to have the optimized and quantized version of the exported models along side with the “normal” ones. I can imagine something like:

python -m optimum.exporters.onnx --model <model-name> -OX -quantized-arch <arch> output

Where:

  • -OX corresponds to the already available O1, O2, O3 and O4 optimization possibilities.
  • -quantized-arch can take values such as arm64, avx2, avx512, avx512_vnni and tensorrt

Motivation

This will allow to very easily create optimized/quantized version of the models we need.

Your contribution

I might help on submiting a PR for it, but I’m not able to give a “when” for now.

Issue Analytics

  • State:open
  • Created 9 months ago
  • Reactions:2
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
fxmartycommented, Dec 12, 2022

Hey @jplu I was thinking of creating the backbone of a optimum-cli for now only supporting the onnx export (e.g. by simply mapping to python -m optimum.exporters.onnx).

Once this is done, would you be interested in adding for example optimum-cli --onnxruntime --optimize? We can discuss more the design if you want 😃

2reactions
fxmartycommented, Dec 9, 2022

Yes I think it’s a neat idea! Maybe it would be better to have a optimum-cli (https://github.com/huggingface/optimum/issues/188) in the same fashion as transformers, and have commands as:

optimum-cli export --onnx --model <model-name> onnx_output/

Dynamic quantization:

optimum-cli quantize --onnxruntime --arch avx2 --path onnx_output/

ORT optimizations:

optimum-cli optimize --onnxruntime -O2 --path onnx_output/

And we could actually support the same for OpenVINO, Intel Neural Compressor, etc.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Export a model to ONNX with optimum.exporters.onnx
Once exported, a model can be optimized for inference via techniques such as graph optimization and quantization. Check the optimum.onnxruntime subpackage ...
Read more >
Issues · huggingface/optimum - GitHub
Issues list ; Add support for Speech Encoder Decoder models in optimum.exporters.onnx. #625 opened in 16 hours ; Quantization failed for transformers m2m100...
Read more >
Video: Accelerate Transformer inference with Optimum and ...
I start from a DistilBERT model fine-tuned for text classification, export it to ONNX format, then optimize it, and finally quantize it.
Read more >
Compare Methods for Converting and Optimizing ... - Wandb
In this report, we're going to walk through how to convert trained PyTorch and Keras models to slimmer, leaner models for deployment.
Read more >
ONNX Runtime Performance Tuning
ONNX Runtime provides high performance across a range of hardware options through its ... To profile CUDA kernels, please add cupti library to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found