Long wait times for first request from TorchScript model
See original GitHub issueI have two identical models, one in code + weights, the other in TorchScript. Doing inference with TorchScript takes far, far longer, which is surprising.
The setup:
The non-TorchScript model is just the DenseNet-161 model archive from the README.me quick start.
The TorchScript model is the same one, but exported to TorchScript thus:
import torch
import torchvision
d161 = torchvision.models.densenet161(pretrained=True)
tsd161 = torch.jit.script(d161)
tsd161.save('tsd161.pt')
It was then packaged with:
torch-model-archiver --model-name tsd161 --version 1.0 --serialized-file tsd161.pt --handler image_classifier
The server is started with:
torchserve --start --model-store model_store --models densenet161=densenet161.mar tsd161=tsd161.mar
This is the timing output from calling the regular model:
time curl -X POST http://127.0.0.1:8080/predictions/densenet161 -T kitten.jpg
[
{
"tiger_cat": 0.46933549642562866
},
{
"tabby": 0.4633878469467163
},
{
"Egyptian_cat": 0.06456148624420166
},
{
"lynx": 0.0012828214094042778
},
{
"plastic_bag": 0.00023323034110944718
}
]
curl -X POST http://127.0.0.1:8080/predictions/densenet161 -T kitten.jpg 0.01s user 0.01s system 2% cpu 0.428 total
And from the TorchScript:
time curl -X POST http://127.0.0.1:8080/predictions/tsd161 -T kitten.jpg
[
{
"282": "0.46933549642562866"
},
{
"281": "0.4633878469467163"
},
{
"285": "0.06456148624420166"
},
{
"287": "0.0012828214094042778"
},
{
"728": "0.00023323034110944718"
}
]curl -X POST http://127.0.0.1:8080/predictions/tsd161 -T kitten.jpg 0.01s user 0.01s system 0% cpu 1:16.54 total
The identical output between the two (except for the human-readable labels) shows we’re dealing with the same model in both instances.
I’m marking this launch blocking, at least until we understand what’s happening.
Issue Analytics
- State:
- Created 4 years ago
- Comments:11 (8 by maintainers)
Top GitHub Comments
filed JIT ticket for potential improvements: https://github.com/pytorch/pytorch/issues/33354
sorry, ignore this, i didnt notice that the model was getting deployed onto gpu without any further setup, so that overhead is probably due to the model being on gpu, some CUDA cache coldness. now there seems to be still a slight lag in first calls on CPU, though probably negligible.
Thanks!