question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

InferenceServerClient request time is large than model-analyzer

See original GitHub issue

Description I train a resnet50 mdoel with torch, then model.pt–>model.onnx–>model.trt and I use model-analyzer to test model, the result follow: image

image

For resnet_trt_config_default ,p99 Latency is 48.3ms.

Last I run the model with resnet_trt_config_default config :

docker run --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v  /model_rep:/models nvcr.io/nvidia/tritonserver:22.07-py3 tritonserver --model-repository=/models

request the server

start_time=time.time()
results = triton_client.infer('resnet_pt', inputs=inputs, outputs=outputs)    
print("infer time:",time.time()-start_time)

# **infer time: 602.1ms**

the infer time is too large

Triton Information image: nvcr.io/nvidia/tritonserver:22.07-py3 cuda: 11.6 config:

name: "resnet_trt"
platform: "tensorrt_plan"
max_batch_size: 1
input {
  name: "input"
  data_type: TYPE_FP32
  dims: 3
  dims: 297
  dims: 640
}
output {
  name: "output"
  data_type: TYPE_FP32
  dims: 2
}
instance_group {
  count: 6
  kind: KIND_GPU
}
dynamic_batching {
}
backend: "tensorrt"

Expected behavior why infer time is large than model-analyzer results? if you can give some advice,very thanks!

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
Lzhang-hubcommented, Nov 9, 2022

Sure! You can use the metrics API to get queue time, infer duration, and a lot more: https://github.com/triton-inference-server/server/blob/main/docs/user_guide/metrics.md

For debugging purposes (not production), tracing your requests can also get exact timestamps: https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_trace.md

Thanks! I will try it

0reactions
dyastremskycommented, Nov 9, 2022

My pleasure! Good luck. Closing ticket now. If you need further assistance with this issue, let me know and I’ll reopen it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Deploying GPT-J and T5 with NVIDIA Triton Inference Server
This post is a guide to optimized inference of large transformer models ... code to send requests to the server with accelerated models....
Read more >
NVIDIA Triton Spam Detection Engine of C-Suite Labs
Application of NVIDIA Triton Inference Server for this use case provides the inference throughput 2.4 times higher than TorchScript ...
Read more >
Optimizing Model Deployments with Triton Model Analyzer
How do you identify the batch size and number of model instances for the optimal inference performance? Triton Model Analyzer is an offline ......
Read more >
Triton Inference Server server Issues - Giters
InferenceServerClient request time is large than model-analyzer. Closed 6 days ago 5 ... Triton server container taking long time to launch.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found