question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

infrence of torch script model much slower with triton than python environment

See original GitHub issue

Description A clear and concise description of what the bug is. I convert models in an open source OCR project with torch.jit.trace() and put it on triton to do inference, but the inference speed much slower than I just run the traced model in python environment

Triton Information What version of Triton are you using? 21.04 py3 Are you using the Triton container or did you build it yourself? container To Reproduce Steps to reproduce the behavior. pip install easyocr ` import easyocr import torch reader = easyocr.Reader([‘ch_sim’,‘en’],quantize=False,gpu=False) imgH=640 imgW=352 max_length = 36 batch_size = 8 image = torch.ones([1,3,imgH, imgW]) image = torch.autograd.Variable(image).cuda() img = torch.ones([batch_zie, 1, 64, 256]).cuda() text = torch.ones([1,int(max_length+1)]).cuda()

detector = reader.detector.cuda() scripted_detector = torch.jit.trace(detector, image) scripted_detector.save(‘scripted_detector.pt’) recognizer = reader.recognizer.cuda() scripted_recognizer = torch.jit.trace(recognizer,(img, text)) scripted_recognizer.save(‘scripted_recognizer.pt’) ` ’ name: “detector” platform: “pytorch_libtorch” max_batch_size: 1 input[ { name: “input_0” data_type: TYPE_FP32 dims: [3,352,640] } ]

output[ { name: “output_0” data_type: TYPE_FP32 dims: [-1, -1] }, { name: “output_1” data_type: TYPE_FP32 dims: [-1, -1] } ] instance_group [ { count: 1 kind: KIND_GPU gpus: [0] } ]

’ `name: “detector” platform: “pytorch_libtorch” max_batch_size: 1 input[ { name: “input_0” data_type: TYPE_FP32 dims: [3,352,640] } ]

output[ { name: “output_0” data_type: TYPE_FP32 dims: [-1, -1] }, { name: “output_1” data_type: TYPE_FP32 dims: [-1, -1] } ] instance_group [ { count: 1 kind: KIND_GPU gpus: [0] } ]

’ name: “detector” platform: “pytorch_libtorch” max_batch_size: 1 input[ { name: “input_0” data_type: TYPE_FP32 dims: [3,352,640] } ]

output[ { name: “output_0” data_type: TYPE_FP32 dims: [-1, -1] }, { name: “output_1” data_type: TYPE_FP32 dims: [-1, -1] } ] instance_group [ { count: 1 kind: KIND_GPU gpus: [0] } ]

’ `name: “recognizer” platform: “pytorch_libtorch” max_batch_size: 1 input[ { name: “input_0” data_type: TYPE_FP32 dims: [1,64,256] }, name: “input_1” data_type: TYPE_INT64 dims: [1,37] ]

output[ { name: “output_0” data_type: TYPE_FP32 dims: [-1, -1] }, ] instance_group [ { count: 1 kind: KIND_GPU gpus: [0] } ] ` Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well). vgg based cnn rcnn Expected behavior A clear and concise description of what you expected to happen. inference time similar or faster than just using easyOCR project with python, gpu

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:37 (19 by maintainers)

github_iconTop GitHub Comments

1reaction
nieksandcommented, Dec 1, 2021

It looks like it is fixed.

Our EfficientNet latency is now comparable to what we saw on Triton 20.07.

@koval reran his repro script, and performance for 21.11 with autocast enabled looks good:

# python3 perf_conv2d.py 1000
Measuring latency with autocast_enabled=False
Avg time per zeropad2d + conv2d (ms): 0.2254070260005392

Measuring latency with autocast_enabled=True
Avg time per zeropad2d + conv2d (ms): 0.22318216899839172

We have one EfficientNet based model still showing a 2x latency hit (vs. 10x previously), but this may be a separate issue. We’ll do some profiling and open a fresh ticket if we can narrow down the cause.

Thank you for resolving this!

1reaction
kovalcommented, Sep 24, 2021

@Tabrizian All my investigation was done on NVIDIA T4 GPU in g4dn.xlarge AWS instance. The numbers for the official PyTorch releases come from nvcr.io/nvidia/tritonserver:21.07-py3 container where I installed PyTorch via pip (both stable 1.9 version and nightly 1.10 and 1.11 versions).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Torchscript backend **MUCH** slower only with FP16 on 1650
I'm converting a pytorch model to torchscript with or without fp16 precision, and I get much slower triton inference when using FP16, ...
Read more >
TorchScript Model inference slow in python - PyTorch Forums
As the snippet below, script model actually get slower than average time 0.12sec in the first iteration in for loop as @driazati mentioned ......
Read more >
Accelerating Inference Up to 6x Faster in PyTorch with Torch ...
In this post, you perform inference through an image classification model called EfficientNet and calculate the throughputs when the model is ...
Read more >
A Quantitative Comparison of Serving Platforms for Neural ...
TensorFlow Serving in default configuration is surprisingly slow compared to TorchServe and Triton Inference Server. The biggest surprise is ...
Read more >
How to Convert a Model from PyTorch to TensorRT and ...
So we'll compare inference time. At first launch, CUDA initializes and caches some data so the first call of any CUDA function is...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found