Python backend on CPU is slower when serving a pytorch model
See original GitHub issueDescription I have a python model that uses pre-trained roberta model for the inference. I have added this model to Triton to use python backend to serve. We also have the exact same python code/model being served using an fastapi application. Both are running on hardware with same specs. When I compared both the models in terms of performance on CPU, the latency with Triton is very high. I used pytorch profiler to profile the code to debug what is causing the higher latencies with Triton. Below screenshots shows the outputs of pytorch profiler.
Triton-CPU
FastAPI-CPU
Based on the screenshots I can see that particularly the native_layer_norm
is taking significantly longer with Triton when compared with model running using our fastapi application. native_layer_norm
is part of the pre-trained roberta model.
Triton Information What version of Triton are you using? Version: 21.07
Are you using the Triton container or did you build it yourself? I built the image myself based on r21.07 but I have also tested serving the model using Official Triton Containers-r21.07 and r21.08 the issue still remains the same
To Reproduce Steps to reproduce the behavior.
Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).
Dependencies: torch==1.6.0 transformers==3.5.1
config.pbtxt
name: "sample-model"
backend: "python"
max_batch_size: 8
input [
{
name: "INPUT0"
data_type: TYPE_STRING
dims: [1]
}
]
output [
{
name: "OUTPUT0"
data_type: TYPE_STRING
dims: [1]
}
]
parameters: {
key: "EXECUTION_ENV_PATH",
value: {string_value: "<path to execution env>"}
}
instance_group [
{
count: 1
kind: KIND_CPU
}
]
Expected behavior Ideally the performance should be similar when the same model is being run with Triton
Issue Analytics
- State:
- Created 2 years ago
- Comments:29 (16 by maintainers)
Top GitHub Comments
@tanmayv25 In my initial testing the results looks good. The performance is greatly improved. Below is some summary from initial testing
Before Fix
After Fix
I have some more testing pending. I will update here Once I am done with the complete testing.
@tanmayv25 ok, thank you very much.