Add optimization and quantization options to `optimum.exporters.onnx`
See original GitHub issueFeature request
Would be nice to have two more arguments in optimum.exporters.onnx
in order to have the optimized and quantized version of the exported models along side with the “normal” ones. I can imagine something like:
python -m optimum.exporters.onnx --model <model-name> -OX -quantized-arch <arch> output
Where:
-OX
corresponds to the already availableO1
,O2
,O3
andO4
optimization possibilities.-quantized-arch
can take values such asarm64
,avx2
,avx512
,avx512_vnni
andtensorrt
Motivation
This will allow to very easily create optimized/quantized version of the models we need.
Your contribution
I might help on submiting a PR for it, but I’m not able to give a “when” for now.
Issue Analytics
- State:
- Created 9 months ago
- Reactions:2
- Comments:8 (4 by maintainers)
Top Results From Across the Web
Export a model to ONNX with optimum.exporters.onnx
Once exported, a model can be optimized for inference via techniques such as graph optimization and quantization. Check the optimum.onnxruntime subpackage ...
Read more >Issues · huggingface/optimum - GitHub
Issues list ; Add support for Speech Encoder Decoder models in optimum.exporters.onnx. #625 opened in 16 hours ; Quantization failed for transformers m2m100...
Read more >Video: Accelerate Transformer inference with Optimum and ...
I start from a DistilBERT model fine-tuned for text classification, export it to ONNX format, then optimize it, and finally quantize it.
Read more >Compare Methods for Converting and Optimizing ... - Wandb
In this report, we're going to walk through how to convert trained PyTorch and Keras models to slimmer, leaner models for deployment.
Read more >ONNX Runtime Performance Tuning
ONNX Runtime provides high performance across a range of hardware options through its ... To profile CUDA kernels, please add cupti library to...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hey @jplu I was thinking of creating the backbone of a
optimum-cli
for now only supporting the onnx export (e.g. by simply mapping topython -m optimum.exporters.onnx
).Once this is done, would you be interested in adding for example
optimum-cli --onnxruntime --optimize
? We can discuss more the design if you want 😃Yes I think it’s a neat idea! Maybe it would be better to have a
optimum-cli
(https://github.com/huggingface/optimum/issues/188) in the same fashion as transformers, and have commands as:Dynamic quantization:
ORT optimizations:
And we could actually support the same for OpenVINO, Intel Neural Compressor, etc.