question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Triton server is slower than pytorch model

See original GitHub issue

Description I have converted model with Torchscript format I call Triton server using Grpc client. The Model has been synthesized successfully. But the speed of the model in Triton server is very slow, when it only synthesizes 1 short sentence with 4 words, with Pytorch model it takes less than 0.1 seconds, however with Triton server, the calculation time is about 1.5 seconds.

Triton Information What version of Triton are you using? r21.04, server version 2.7.0 , cuda_11.2, GPU device:RTX 2080 Ti

To Reproduce Steps to reproduce the behavior.

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

name: "FastPitch"
platform: "pytorch_libtorch"
default_model_filename: "model.pt"

max_batch_size: 8

input {
    name: "INPUT__0"
    data_type: TYPE_INT64
    dims: -1
}
input {
    name: "INPUT__1"
    data_type: TYPE_INT64
    dims: 1
}
output {
    name: "OUTPUT__0"
    data_type: TYPE_FP16
    dims: 80
    dims: -1
}
output {
    name: "OUTPUT__1"
    data_type: TYPE_INT64
    dims: 1
    reshape {
    }
}
output {
    name: "OUTPUT__2"
    data_type: TYPE_FP16
    dims: -1
}
output {
    name: "OUTPUT__3"
    data_type: TYPE_FP16
    dims: -1
}

dynamic_batching {
    preferred_batch_size: [ 4, 8 ]
}

instance_group {
    count: 1
    gpus: 0
    kind: KIND_GPU
}

Triton server’s log verbose:

I0908 08:50:40.641507 2516 libtorch.cc:1095] model FastPitch, instance FastPitch_0, executing 1 requests
I0908 08:50:40.641566 2516 libtorch.cc:504] TRITONBACKEND_ModelExecute: Running FastPitch_0 with 1 requests
I0908 08:50:42.023955 2516 infer_response.cc:165] add response output: output: OUTPUT__0, type: FP16, shape: [1,80,81]

Expected behavior I have tried many times but can’t fix it.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
guyqazcommented, Oct 14, 2021

sorry for the late reply, i will recalculate the result

0reactions
CoderHamcommented, Nov 15, 2021

@guyqaz closing due to inactivity. I can re-open once you provide us with additional reproduction steps for the issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

[FastPitch/Triton] Triton server is slower than pytorch model
The Model has been synthesized successfully. But the speed of the model in Triton server is very slow, when it only synthesizes 1...
Read more >
Latest Triton Inference Server - archived topics
Topic Replies Views Activity Mask RCNN TensorRT in Triton 0 461 July 9, 2020 NVlink support issues · hw , board‑design 0 188 July 6,...
Read more >
Serving TensorRT Models with NVIDIA Triton Inference Server
Triton TensorRT is Slower than Local TensorRT. Before we end the article, one caveat I have to mention is that Triton server really...
Read more >
Deploying a PyTorch model with Triton Inference Server in 5 ...
With Triton, it's possible to deploy PyTorch, TensorFlow, or even XGBoost / LightGBM models. Triton can automatically optimize the model for inference on ......
Read more >
Use Triton Inference Server with Amazon SageMaker
SageMaker enables customers to deploy a model using custom code with NVIDIA Triton Inference Server. This functionality is available through the development ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found