InferenceServerClient request time is large than model-analyzer
See original GitHub issueDescription I train a resnet50 mdoel with torch, then model.pt–>model.onnx–>model.trt and I use model-analyzer to test model, the result follow:
For resnet_trt_config_default
,p99 Latency is 48.3ms.
Last I run the model with resnet_trt_config_default
config :
docker run --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v /model_rep:/models nvcr.io/nvidia/tritonserver:22.07-py3 tritonserver --model-repository=/models
request the server
start_time=time.time()
results = triton_client.infer('resnet_pt', inputs=inputs, outputs=outputs)
print("infer time:",time.time()-start_time)
# **infer time: 602.1ms**
the infer time is too large
Triton Information image: nvcr.io/nvidia/tritonserver:22.07-py3 cuda: 11.6 config:
name: "resnet_trt"
platform: "tensorrt_plan"
max_batch_size: 1
input {
name: "input"
data_type: TYPE_FP32
dims: 3
dims: 297
dims: 640
}
output {
name: "output"
data_type: TYPE_FP32
dims: 2
}
instance_group {
count: 6
kind: KIND_GPU
}
dynamic_batching {
}
backend: "tensorrt"
Expected behavior why infer time is large than model-analyzer results? if you can give some advice,very thanks!
Issue Analytics
- State:
- Created a year ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
Deploying GPT-J and T5 with NVIDIA Triton Inference Server
This post is a guide to optimized inference of large transformer models ... code to send requests to the server with accelerated models....
Read more >NVIDIA Triton Spam Detection Engine of C-Suite Labs
Application of NVIDIA Triton Inference Server for this use case provides the inference throughput 2.4 times higher than TorchScript ...
Read more >Optimizing Model Deployments with Triton Model Analyzer
How do you identify the batch size and number of model instances for the optimal inference performance? Triton Model Analyzer is an offline ......
Read more >Triton Inference Server server Issues - Giters
InferenceServerClient request time is large than model-analyzer. Closed 6 days ago 5 ... Triton server container taking long time to launch.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks! I will try it
My pleasure! Good luck. Closing ticket now. If you need further assistance with this issue, let me know and I’ll reopen it.