Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Custom pip package installation failed while running torchserve with custom library

See original GitHub issue

2020-10-23 21:31:48,131 [WARN ] main org.pytorch.serve.ModelServer - Failed to load model: mymodel_docker_import.mar org.pytorch.serve.archive.ModelException: Custom pip package installation failed for mymodel

Context

torchserve version: 0.2.0
torch version: 1.6.0
Operating System and version: ubuntu:18.04

Your Environment

Installed using source? [yes/no]: no
Are you planning to deploy it using docker container? [yes/no]: yes
Is it a CPU or GPU environment?: CPU
Using a default/custom handler? [If possible upload/share custom handler/model]: custom
What kind of model is it e.g. vision, text, audio?: vision
Are you planning to use local models from model-store or public url being used e.g. from S3 bucket etc.? [If public url then provide link.]: local models from model-store
Provide config.properties, logs [ts.log] and parameters used for model registration/update APIs: OK

Current Behavior

My custom handler uses opencv library. I edited config.properties and used –requirements-file parameter in torch-model-archiver. It fails to get a response from a running torchserve

Steps to Reproduce

Step. 1.

docker run --rm -it -p 8080:8080 -p 8081:8081 \
            -v $(pwd)/model_store:/home/model-server/model-store \
            -v $(pwd)/scripts:/home/model-server/scripts \
            -v $(pwd)/handlers:/home/model-server/handlers \
            --name mar torchserve:latest

Step. 2.

docker exec -it --user root mar /bin/bash

torch-model-archiver --model-name mymode_docker_import --version 1.0 \
    --serialized-file $(pwd)/scripts/model.pt --handler $(pwd)/handlers/mymodel_handler.py \
    --export-path $(pwd)/model-store --requirements-file $(pwd)/handlers/requirements.txt

Step. 2.1.

requirements.txt is

opencv-python==4.4.0.42

Step. 3.

torchserve --start --model-store model-store --models mymodel=mymodel_docker_import.mar --ts-config $(pwd)/model-store/config.properties

Step. 3.1.

config.properties is

inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
default_workers_per_model=2
install_py_dep_per_model=true

Step. 4.

curl http://localhost:8080/ping responds: { “status”: “Healthy”}

Step. 5.

curl localhost:8081/models/ responds:

{
  "models": [
    {
      "modelName": "mymodel",
      "modelUrl": "mymodel_docker_import.mar"
    }
  ]
}

BUT when I send a request I get 503 Error

2020-10-23 22:28:18,812 [INFO ] epollEventLoopGroup-3-4 ACCESS_LOG - /172.17.0.1:43698 "GET /models/ HTTP/1.1" 200 0
2020-10-23 22:28:18,812 [INFO ] epollEventLoopGroup-3-4 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:df5072c52f11,timestamp:null
2020-10-23 22:28:25,794 [INFO ] epollEventLoopGroup-3-1 ACCESS_LOG - /172.17.0.1:55526 **"POST /predictions/mymodel HTTP/1.1" 503** 80
2020-10-23 22:28:25,794 [INFO ] epollEventLoopGroup-3-1 TS_METRICS - Requests5XX.Count:1|#Level:Host|#hostname:df5072c52f11,timestamp:null

Failure Logs [if any]

Torchserve version: 0.2.0 TS Home: /home/venv/lib/python3.6/site-packages Current directory: /home/model-server Temp directory: /home/model-server/tmp Number of GPUs: 0 Number of CPUs: 2 Max heap size: 984 M Python executable: /home/venv/bin/python3 Config file: /home/model-server/model-store/config.properties Inference address: http://0.0.0.0:8080 Management address: http://0.0.0.0:8081 Metrics address: http://127.0.0.1:8082 Model Store: /home/model-server/model-store Initial Models: mymodel=mymodel_docker_import.mar Log dir: /home/model-server/logs Metrics dir: /home/model-server/logs Netty threads: 0 Netty client threads: 0 Default workers per model: 2 Blacklist Regex: N/A Maximum Response Size: 6553500 Maximum Request Size: 6553500 Prefer direct buffer: false Custom python dependency for model allowed: true Metrics report format: prometheus Enable metrics API: true 2020-10-23 22:13:45,351 [INFO ] main org.pytorch.serve.ModelServer - Loading initial models: mymodel_docker_import.mar 2020-10-23 22:13:50,305 [INFO ] main org.pytorch.serve.archive.ModelArchive - eTag 9da49c85305c4c4a8081cd0be6bb5826 2020-10-23 22:13:50,328 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model mymodel 2020-10-23 22:13:50,328 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model mymodel 2020-10-23 22:13:50,329 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model mymodel loaded. 2020-10-23 22:14:08,732 [WARN ] main org.pytorch.serve.ModelServer - Failed to load model: mymodel_docker_import.mar org.pytorch.serve.archive.ModelException: Custom pip package installation failed for mymodel at org.pytorch.serve.wlm.ModelManager.setupModelDependencies(ModelManager.java:190) at org.pytorch.serve.wlm.ModelManager.registerModel(ModelManager.java:125) at org.pytorch.serve.ModelServer.initModelStore(ModelServer.java:213) at org.pytorch.serve.ModelServer.start(ModelServer.java:308) at org.pytorch.serve.ModelServer.startAndWait(ModelServer.java:104) at org.pytorch.serve.ModelServer.main(ModelServer.java:85) 2020-10-23 22:14:08,741 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel. 2020-10-23 22:14:08,884 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://0.0.0.0:8080 2020-10-23 22:14:08,885 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: EpollServerSocketChannel. 2020-10-23 22:14:08,902 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://0.0.0.0:8081 2020-10-23 22:14:08,902 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel. 2020-10-23 22:14:08,910 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://127.0.0.1:8082 Model server started. 2020-10-23 22:14:09,079 [INFO ] pool-2-thread-1 TS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:df5072c52f11,timestamp:1603491249

Issue Analytics

State:
Created 3 years ago
Comments:7

Top GitHub Comments

3reactions

MartinEliasQcommented, Oct 26, 2020

@veronikayurchuk Try this Dockerfile (#724). With the current version there is a problem with the permissions of the virtual python environment and the user of the container. In my case I built the container with those changes and it worked fine.

1reaction

ananiask8commented, Oct 3, 2022

Replace the opencv-python dependency with opencv-python-headless.