No performance improvement when optimizing models
See original GitHub issueDescription
In config.pbtxt
file I specify TensorRT optimization but inference performance is the same.
Triton Information 21.05-py3 Running pre-built docker container.
To Reproduce
Model config.pbtxt
:
name: "VGG16"
platform: "tensorflow_savedmodel"
max_batch_size: 64
input {
name: "Input"
data_type: TYPE_FP32
dims: [ 224, 224, 3 ]
format: FORMAT_NHWC
}
output {
name: "VGG16"
data_type: TYPE_FP32
dims: [ 1000]
is_shape_tensor: false
}
optimization { execution_accelerators {
gpu_execution_accelerator : [ {
name : "tensorrt"
parameters { key: "precision_mode" value: "FP16" }}]
}}
Expected behavior If I use TensorRT optimization manually / outside de Triton Server container, inference speed improves by an order of magnitude. I expect the same to happen by loading the model in Triton Server.
params = trt.DEFAULT_TRT_CONVERSION_PARAMS._replace(precision_mode='FP32')
converter = trt.TrtGraphConverterV2(
input_saved_model_dir = self.model_dir,
conversion_params = params)
converter.convert()
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (4 by maintainers)
Top Results From Across the Web
Techniques for Improving Optimization Model Performance
Three Techniques for Improving Optimization Model Performance · Why is output structure important? · When in doubt, simplify · Minimize the number ...
Read more >Performance Improvement is not Performance Optimization
While it is not always true that performance improvement requires performance optimization, it is almost universally true that performance ...
Read more >[JIT] traced model with optimization shows no performance ...
The test model I used is resnet from torchvision. I modified it to run only the features extraction (no ave pooling and fc...
Read more >How to Optimize a Deep Learning Model | by Zachary Warnes
Hyperparameter optimization is a critical part of deep learning. Just selecting a model is not enough to achieve exceptional performance.
Read more >Model Optimization | Machine Learning - Google Developers
You can improve model performance by adding features that encode information not yet encoded by your existing features.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
The most reliable path is to apply TF-TRT optimization outside of triton and then use the resulting TF model with triton. If you do that you should see the full performance improvement provided by the TF-TRT optimization. Using TF-TRT optimization “online” in triton is less reliable (as you have seen). When doing the optimization offline be sure to request fp16 precision if that is what you want (as you did in the online specification)
Closing due to inactivity.