infrence of torch script model much slower with triton than python environment
See original GitHub issueDescription A clear and concise description of what the bug is. I convert models in an open source OCR project with torch.jit.trace() and put it on triton to do inference, but the inference speed much slower than I just run the traced model in python environment
Triton Information
What version of Triton are you using?
21.04 py3
Are you using the Triton container or did you build it yourself?
container
To Reproduce
Steps to reproduce the behavior.
pip install easyocr
`
import easyocr
import torch
reader = easyocr.Reader([‘ch_sim’,‘en’],quantize=False,gpu=False)
imgH=640
imgW=352
max_length = 36
batch_size = 8
image = torch.ones([1,3,imgH, imgW])
image = torch.autograd.Variable(image).cuda()
img = torch.ones([batch_zie, 1, 64, 256]).cuda()
text = torch.ones([1,int(max_length+1)]).cuda()
detector = reader.detector.cuda() scripted_detector = torch.jit.trace(detector, image) scripted_detector.save(‘scripted_detector.pt’) recognizer = reader.recognizer.cuda() scripted_recognizer = torch.jit.trace(recognizer,(img, text)) scripted_recognizer.save(‘scripted_recognizer.pt’) ` ’ name: “detector” platform: “pytorch_libtorch” max_batch_size: 1 input[ { name: “input_0” data_type: TYPE_FP32 dims: [3,352,640] } ]
output[ { name: “output_0” data_type: TYPE_FP32 dims: [-1, -1] }, { name: “output_1” data_type: TYPE_FP32 dims: [-1, -1] } ] instance_group [ { count: 1 kind: KIND_GPU gpus: [0] } ]
’ `name: “detector” platform: “pytorch_libtorch” max_batch_size: 1 input[ { name: “input_0” data_type: TYPE_FP32 dims: [3,352,640] } ]
output[ { name: “output_0” data_type: TYPE_FP32 dims: [-1, -1] }, { name: “output_1” data_type: TYPE_FP32 dims: [-1, -1] } ] instance_group [ { count: 1 kind: KIND_GPU gpus: [0] } ]
’ name: “detector” platform: “pytorch_libtorch” max_batch_size: 1 input[ { name: “input_0” data_type: TYPE_FP32 dims: [3,352,640] } ]
output[ { name: “output_0” data_type: TYPE_FP32 dims: [-1, -1] }, { name: “output_1” data_type: TYPE_FP32 dims: [-1, -1] } ] instance_group [ { count: 1 kind: KIND_GPU gpus: [0] } ]
’ `name: “recognizer” platform: “pytorch_libtorch” max_batch_size: 1 input[ { name: “input_0” data_type: TYPE_FP32 dims: [1,64,256] }, name: “input_1” data_type: TYPE_INT64 dims: [1,37] ]
output[ { name: “output_0” data_type: TYPE_FP32 dims: [-1, -1] }, ] instance_group [ { count: 1 kind: KIND_GPU gpus: [0] } ] ` Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well). vgg based cnn rcnn Expected behavior A clear and concise description of what you expected to happen. inference time similar or faster than just using easyOCR project with python, gpu
Issue Analytics
- State:
- Created 2 years ago
- Comments:37 (19 by maintainers)
Top GitHub Comments
It looks like it is fixed.
Our EfficientNet latency is now comparable to what we saw on Triton 20.07.
@koval reran his repro script, and performance for 21.11 with autocast enabled looks good:
We have one EfficientNet based model still showing a 2x latency hit (vs. 10x previously), but this may be a separate issue. We’ll do some profiling and open a fresh ticket if we can narrow down the cause.
Thank you for resolving this!
@Tabrizian All my investigation was done on NVIDIA T4 GPU in g4dn.xlarge AWS instance. The numbers for the official PyTorch releases come from
nvcr.io/nvidia/tritonserver:21.07-py3
container where I installed PyTorch via pip (both stable 1.9 version and nightly 1.10 and 1.11 versions).