question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Torchserve latest-gpu remote docker image does not support Metrics API

See original GitHub issue

Your issue may already be reported! Please search on the issue tracker before creating one.

Context

The remote docker image pytorch/torchserve:latest-gpu fails to receive metrics API calls on port 8082. Please see the logs below where I tested:

  • Deploying torchserve on EKS with the “latest-gpu” image, metrics API fails.
  • Deploying torchserve on EKS with the “latest” image, metrics API succeeds.
  • Running torchserve locally with the docker image “latest-gpu”, metrics API fails.

Expected Behavior

Making a request to <torchserve>:8082/metrics should work.

Current Behavior

Making the request results in an error.

Possible Solution

Steps to Reproduce

  1. Run a docker container using command docker run --rm -it --gpus all -p 8080:8080 -p 8081:8081 -p 8082:8082 pytorch/torchserve:latest-gpu
  2. Make an inference at least once curl -X POST http://localhost:8080/predictions/squeezenet1_1 -T kitten.jpg
  3. Query the metrics API endpoint curl http://localhost:8082/metrics and this will fail

Failure Logs [if any]

failure_log.txt

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
maaquibcommented, Oct 26, 2020

@jeremiahschung I don’t think the latest gpu image was tagged to pytorch/torchserve:latest-gpu. You should try with pytorch/torchserve:0.2.0-cuda10.1-cudnn7-runtime. @seemethere Can confirm

https://hub.docker.com/r/pytorch/torchserve/tags?page=1

0reactions
harshbafnacommented, Dec 22, 2020

Validated this on latest torchserve:latest-gpu image

ubuntu@ip-172-31-69-243:~$ docker run --rm -it --gpus all -p 8080:8080 -p 8081:8081 -p 8082:8082 pytorch/torchserve:latest-gpu


ubuntu@ip-172-31-69-243:~$ curl -X POST "http://localhost:8081/models?url=https://torchserve.pytorch.org/mar_files/squeezenet1_1.mar&initial_workers=1"
{
  "status": "Model \"squeezenet1_1\" Version: 1.0 registered with 1 initial workers"
}


ubuntu@ip-172-31-69-243:~$ curl -X POST "http://localhost:8080/predictions/squeezenet1_1" -T serve/examples/image_classifier/kitten.jpg 
{
  "tabby": 0.2752005457878113,
  "lynx": 0.2546869218349457,
  "tiger_cat": 0.24254141747951508,
  "Egyptian_cat": 0.22137463092803955,
  "cougar": 0.0022544844541698694
}


ubuntu@ip-172-31-69-243:~$ curl http://localhost:8082/metrics
# HELP ts_inference_latency_microseconds Cumulative inference duration in microseconds
# TYPE ts_inference_latency_microseconds counter
ts_inference_latency_microseconds{uuid="912a19f8-44e3-4752-be02-f5ac428abaf2",model_name="squeezenet1_1",model_version="default",} 37233.815
# HELP ts_queue_latency_microseconds Cumulative queue duration in microseconds
# TYPE ts_queue_latency_microseconds counter
ts_queue_latency_microseconds{uuid="912a19f8-44e3-4752-be02-f5ac428abaf2",model_name="squeezenet1_1",model_version="default",} 246.001
# HELP ts_inference_requests_total Total number of inference requests.
# TYPE ts_inference_requests_total counter
ts_inference_requests_total{uuid="912a19f8-44e3-4752-be02-f5ac428abaf2",model_name="squeezenet1_1",model_version="default",} 1.0
ubuntu@ip-172-31-69-243:~$ 

Closing.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Image Layer Details - pytorch/torchserve:latest-gpu | Docker Hub
Layer details are not available for this image.
Read more >
17. Torchserve Use Cases - PyTorch
Docker - Make sure that MAR file is being copied in volume/directory shared while starting torchserve docker image. Start torchserve with following command ......
Read more >
Getting started: Serving PyTorch predictions with a custom ...
This tutorial shows you how to use a custom container to deploy a PyTorch machine learning (ML) model that serves online predictions.
Read more >
Deploying PyTorch Model to Production with FastAPI in CUDA ...
The goal is to serve a trained model as a RESTful API inside a docker container with CUDA support, on a remote cloud...
Read more >
Serving PyTorch Models Using TorchServe - Supertype
Create a model handler and other additional files for the model; Generate model archive; Serve the model using TorchServe; Monitor and manage the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found