Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Torchserve latest-gpu remote docker image does not support Metrics API

See original GitHub issue

Your issue may already be reported! Please search on the issue tracker before creating one.

Context

The remote docker image pytorch/torchserve:latest-gpu fails to receive metrics API calls on port 8082. Please see the logs below where I tested:

Deploying torchserve on EKS with the “latest-gpu” image, metrics API fails.
Deploying torchserve on EKS with the “latest” image, metrics API succeeds.
Running torchserve locally with the docker image “latest-gpu”, metrics API fails.

Expected Behavior

Making a request to <torchserve>:8082/metrics should work.

Current Behavior

Making the request results in an error.

Possible Solution

Steps to Reproduce

Run a docker container using command docker run --rm -it --gpus all -p 8080:8080 -p 8081:8081 -p 8082:8082 pytorch/torchserve:latest-gpu
Make an inference at least once curl -X POST http://localhost:8080/predictions/squeezenet1_1 -T kitten.jpg
Query the metrics API endpoint curl http://localhost:8082/metrics and this will fail

Failure Logs [if any]

failure_log.txt

Issue Analytics

State:
Created 3 years ago
Comments:5 (1 by maintainers)

Top GitHub Comments

1reaction

maaquibcommented, Oct 26, 2020

@jeremiahschung I don’t think the latest gpu image was tagged to pytorch/torchserve:latest-gpu. You should try with pytorch/torchserve:0.2.0-cuda10.1-cudnn7-runtime. @seemethere Can confirm

https://hub.docker.com/r/pytorch/torchserve/tags?page=1

0reactions

harshbafnacommented, Dec 22, 2020

Validated this on latest torchserve:latest-gpu image

ubuntu@ip-172-31-69-243:~$ docker run --rm -it --gpus all -p 8080:8080 -p 8081:8081 -p 8082:8082 pytorch/torchserve:latest-gpu


ubuntu@ip-172-31-69-243:~$ curl -X POST "http://localhost:8081/models?url=https://torchserve.pytorch.org/mar_files/squeezenet1_1.mar&initial_workers=1"
{
  "status": "Model \"squeezenet1_1\" Version: 1.0 registered with 1 initial workers"
}


ubuntu@ip-172-31-69-243:~$ curl -X POST "http://localhost:8080/predictions/squeezenet1_1" -T serve/examples/image_classifier/kitten.jpg 
{
  "tabby": 0.2752005457878113,
  "lynx": 0.2546869218349457,
  "tiger_cat": 0.24254141747951508,
  "Egyptian_cat": 0.22137463092803955,
  "cougar": 0.0022544844541698694
}


ubuntu@ip-172-31-69-243:~$ curl http://localhost:8082/metrics
# HELP ts_inference_latency_microseconds Cumulative inference duration in microseconds
# TYPE ts_inference_latency_microseconds counter
ts_inference_latency_microseconds{uuid="912a19f8-44e3-4752-be02-f5ac428abaf2",model_name="squeezenet1_1",model_version="default",} 37233.815
# HELP ts_queue_latency_microseconds Cumulative queue duration in microseconds
# TYPE ts_queue_latency_microseconds counter
ts_queue_latency_microseconds{uuid="912a19f8-44e3-4752-be02-f5ac428abaf2",model_name="squeezenet1_1",model_version="default",} 246.001
# HELP ts_inference_requests_total Total number of inference requests.
# TYPE ts_inference_requests_total counter
ts_inference_requests_total{uuid="912a19f8-44e3-4752-be02-f5ac428abaf2",model_name="squeezenet1_1",model_version="default",} 1.0
ubuntu@ip-172-31-69-243:~$

Closing.

Top Results From Across the Web

Image Layer Details - pytorch/torchserve:latest-gpu | Docker Hub

Layer details are not available for this image.

17. Torchserve Use Cases - PyTorch

Docker - Make sure that MAR file is being copied in volume/directory shared while starting torchserve docker image. Start torchserve with following command ......

Getting started: Serving PyTorch predictions with a custom ...

This tutorial shows you how to use a custom container to deploy a PyTorch machine learning (ML) model that serves online predictions.

Deploying PyTorch Model to Production with FastAPI in CUDA ...

The goal is to serve a trained model as a RESTful API inside a docker container with CUDA support, on a remote cloud...

Serving PyTorch Models Using TorchServe - Supertype

Create a model handler and other additional files for the model; Generate model archive; Serve the model using TorchServe; Monitor and manage the...