Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Long wait times for first request from TorchScript model

See original GitHub issue

I have two identical models, one in code + weights, the other in TorchScript. Doing inference with TorchScript takes far, far longer, which is surprising.

The setup:

The non-TorchScript model is just the DenseNet-161 model archive from the README.me quick start.

The TorchScript model is the same one, but exported to TorchScript thus:

import torch
import torchvision
d161 = torchvision.models.densenet161(pretrained=True)
tsd161 = torch.jit.script(d161)
tsd161.save('tsd161.pt')

It was then packaged with:

torch-model-archiver --model-name tsd161 --version 1.0 --serialized-file tsd161.pt --handler image_classifier

The server is started with:

torchserve --start --model-store model_store --models densenet161=densenet161.mar tsd161=tsd161.mar

This is the timing output from calling the regular model:

time curl -X POST http://127.0.0.1:8080/predictions/densenet161 -T kitten.jpg
[
  {
    "tiger_cat": 0.46933549642562866
  },
  {
    "tabby": 0.4633878469467163
  },
  {
    "Egyptian_cat": 0.06456148624420166
  },
  {
    "lynx": 0.0012828214094042778
  },
  {
    "plastic_bag": 0.00023323034110944718
  }
]
curl -X POST http://127.0.0.1:8080/predictions/densenet161 -T kitten.jpg  0.01s user 0.01s system 2% cpu 0.428 total

And from the TorchScript:

time curl -X POST http://127.0.0.1:8080/predictions/tsd161 -T kitten.jpg
[
  {
    "282": "0.46933549642562866"
  },
  {
    "281": "0.4633878469467163"
  },
  {
    "285": "0.06456148624420166"
  },
  {
    "287": "0.0012828214094042778"
  },
  {
    "728": "0.00023323034110944718"
  }
]curl -X POST http://127.0.0.1:8080/predictions/tsd161 -T kitten.jpg  0.01s user 0.01s system 0% cpu 1:16.54 total

The identical output between the two (except for the human-readable labels) shows we’re dealing with the same model in both instances.

I’m marking this launch blocking, at least until we understand what’s happening.

Issue Analytics

State:
Created 4 years ago
Comments:11 (8 by maintainers)

Top GitHub Comments

4reactions

nairbvcommented, Feb 14, 2020

filed JIT ticket for potential improvements: https://github.com/pytorch/pytorch/issues/33354

1reaction

ozancaglayancommented, Jul 25, 2022

sorry, ignore this, i didnt notice that the model was getting deployed onto gpu without any further setup, so that overhead is probably due to the model being on gpu, some CUDA cache coldness. now there seems to be still a slight lag in first calls on CPU, though probably negligible.

Thanks!