question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Long wait times for first request from TorchScript model

See original GitHub issue

I have two identical models, one in code + weights, the other in TorchScript. Doing inference with TorchScript takes far, far longer, which is surprising.

The setup:

The non-TorchScript model is just the DenseNet-161 model archive from the README.me quick start.

The TorchScript model is the same one, but exported to TorchScript thus:

import torch
import torchvision
d161 = torchvision.models.densenet161(pretrained=True)
tsd161 = torch.jit.script(d161)
tsd161.save('tsd161.pt')

It was then packaged with:

torch-model-archiver --model-name tsd161 --version 1.0 --serialized-file tsd161.pt --handler image_classifier

The server is started with:

torchserve --start --model-store model_store --models densenet161=densenet161.mar tsd161=tsd161.mar

This is the timing output from calling the regular model:

time curl -X POST http://127.0.0.1:8080/predictions/densenet161 -T kitten.jpg
[
  {
    "tiger_cat": 0.46933549642562866
  },
  {
    "tabby": 0.4633878469467163
  },
  {
    "Egyptian_cat": 0.06456148624420166
  },
  {
    "lynx": 0.0012828214094042778
  },
  {
    "plastic_bag": 0.00023323034110944718
  }
]
curl -X POST http://127.0.0.1:8080/predictions/densenet161 -T kitten.jpg  0.01s user 0.01s system 2% cpu 0.428 total

And from the TorchScript:

time curl -X POST http://127.0.0.1:8080/predictions/tsd161 -T kitten.jpg
[
  {
    "282": "0.46933549642562866"
  },
  {
    "281": "0.4633878469467163"
  },
  {
    "285": "0.06456148624420166"
  },
  {
    "287": "0.0012828214094042778"
  },
  {
    "728": "0.00023323034110944718"
  }
]curl -X POST http://127.0.0.1:8080/predictions/tsd161 -T kitten.jpg  0.01s user 0.01s system 0% cpu 1:16.54 total

The identical output between the two (except for the human-readable labels) shows we’re dealing with the same model in both instances.

I’m marking this launch blocking, at least until we understand what’s happening.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:11 (8 by maintainers)

github_iconTop GitHub Comments

4reactions
nairbvcommented, Feb 14, 2020

filed JIT ticket for potential improvements: https://github.com/pytorch/pytorch/issues/33354

1reaction
ozancaglayancommented, Jul 25, 2022

sorry, ignore this, i didnt notice that the model was getting deployed onto gpu without any further setup, so that overhead is probably due to the model being on gpu, some CUDA cache coldness. now there seems to be still a slight lag in first calls on CPU, though probably negligible.

Thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Too slow first run TorchScript model and its implementation in ...
I don't know how I can store warmed model between requests to avoid time loss each time. The code above can demonstrate it,...
Read more >
Torchscript - KServe Documentation Website
If you are starting out from an existing PyTorch model written in the vanilla eager API, you must first convert your model to...
Read more >
An empirical approach to speedup your BERT inference with ...
We will explore the effects of changing model format and batching with a few experiments: Baseline with vanilla Pytorch CPU/GPU; Export Pytorch ...
Read more >
TorchScript: Tracing vs. Scripting - Yuxin's Blog
Module into a graph represented in TorchScript format: tracing and scripting. This article will: Compare their pros and cons, ...
Read more >
Accelerating Inference Up to 6x Faster in PyTorch with Torch ...
In the first phase, Torch-TensorRT lowers the TorchScript module, ... require you to write a calibrator class that provides sample data to ...
Read more >

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found