Triton server is slower than pytorch model
See original GitHub issueDescription I have converted model with Torchscript format I call Triton server using Grpc client. The Model has been synthesized successfully. But the speed of the model in Triton server is very slow, when it only synthesizes 1 short sentence with 4 words, with Pytorch model it takes less than 0.1 seconds, however with Triton server, the calculation time is about 1.5 seconds.
Triton Information What version of Triton are you using? r21.04, server version 2.7.0 , cuda_11.2, GPU device:RTX 2080 Ti
To Reproduce Steps to reproduce the behavior.
Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).
name: "FastPitch"
platform: "pytorch_libtorch"
default_model_filename: "model.pt"
max_batch_size: 8
input {
name: "INPUT__0"
data_type: TYPE_INT64
dims: -1
}
input {
name: "INPUT__1"
data_type: TYPE_INT64
dims: 1
}
output {
name: "OUTPUT__0"
data_type: TYPE_FP16
dims: 80
dims: -1
}
output {
name: "OUTPUT__1"
data_type: TYPE_INT64
dims: 1
reshape {
}
}
output {
name: "OUTPUT__2"
data_type: TYPE_FP16
dims: -1
}
output {
name: "OUTPUT__3"
data_type: TYPE_FP16
dims: -1
}
dynamic_batching {
preferred_batch_size: [ 4, 8 ]
}
instance_group {
count: 1
gpus: 0
kind: KIND_GPU
}
Triton server’s log verbose:
I0908 08:50:40.641507 2516 libtorch.cc:1095] model FastPitch, instance FastPitch_0, executing 1 requests
I0908 08:50:40.641566 2516 libtorch.cc:504] TRITONBACKEND_ModelExecute: Running FastPitch_0 with 1 requests
I0908 08:50:42.023955 2516 infer_response.cc:165] add response output: output: OUTPUT__0, type: FP16, shape: [1,80,81]
Expected behavior I have tried many times but can’t fix it.
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (3 by maintainers)
Top GitHub Comments
sorry for the late reply, i will recalculate the result
@guyqaz closing due to inactivity. I can re-open once you provide us with additional reproduction steps for the issue.