Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Torchserve docker fails to run on existing mar file

See original GitHub issue

Context

I ran the torch model archiver on a different machine to create a mar with a custom handler for transformer model using this command:

torch-model-archiver --model-name TranslationClassifier --version 1.0 --serialized-file /home/ayush/transformer_model/pytorch_model.bin --handler ./translation_model/text_handler.py --extra-files "./transformer_model/config.json,./transformer_model/special_tokens_map.json,./transformer_model/tokenizer_config.json,./transformer_model/sentencepiece.bpe.model"

It took about 20 mins and the mar file was created correctly. I was able to locally verify torch serve indeed works on that system using the following command:

torchserve --start --model-store model_store --models my_tc=TranslationClassifier.mar

Expected Behavior

In order to run this on kubernetes, I took the pre-existing pytorch/torchserve:latest-gpu image from docker hub, so that I can run in a different environment by leveraging the mar file directly using this command:

sudo docker run -p 8080:8080 -p 8081:8081 -p 8082:8082 -p 7070:7070 -p 7071:7071 --mount type=bind,source=/home/ayush,target=/home/ayush/model_store pytorch/torchserve:latest-gpu torchserve --model-store /home/ayush/model_store --models my_tc=TranslationClassifier.mar

Current Behavior

The execution fails when running that docker container with the following error logs:

2021-03-12 21:13:43,128 [INFO ] W-9002-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/usr/local/lib/python3.6/dist-packages/ts/model_service_worker.py", line 182, in <module>
2021-03-12 21:13:43,128 [INFO ] W-9002-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     worker.run_server()
2021-03-12 21:13:43,128 [INFO ] W-9002-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/usr/local/lib/python3.6/dist-packages/ts/model_service_worker.py", line 154, in run_server
2021-03-12 21:13:43,128 [INFO ] W-9002-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     self.handle_connection(cl_socket)
2021-03-12 21:13:43,128 [INFO ] W-9002-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/usr/local/lib/python3.6/dist-packages/ts/model_service_worker.py", line 116, in handle_connection
2021-03-12 21:13:43,129 [INFO ] W-9002-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     service, result, code = self.load_model(msg)
2021-03-12 21:13:43,129 [INFO ] W-9002-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/usr/local/lib/python3.6/dist-packages/ts/model_service_worker.py", line 89, in load_model
2021-03-12 21:13:43,129 [INFO ] W-9002-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     service = model_loader.load(model_name, model_dir, handler, gpu, batch_size, envelope)
2021-03-12 21:13:43,129 [INFO ] W-9002-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/usr/local/lib/python3.6/dist-packages/ts/model_loader.py", line 83, in load
2021-03-12 21:13:43,130 [INFO ] W-9002-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     module = self._load_default_handler(handler)
2021-03-12 21:13:43,130 [INFO ] W-9002-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/usr/local/lib/python3.6/dist-packages/ts/model_loader.py", line 120, in _load_default_handler
2021-03-12 21:13:43,130 [INFO ] W-9002-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     module = importlib.import_module(module_name, 'ts.torch_handler')
2021-03-12 21:13:43,131 [INFO ] epollEventLoopGroup-5-7 org.pytorch.serve.wlm.WorkerThread - 9002 Worker disconnected. WORKER_STARTED

Steps to Reproduce

Run torch-model-archiver on a model and take the mar file into a different machine
Run the docker image for pytorch serve with that existing mar file. It looks like its unable to find the customer handler that was used while running torch model archiver. My understanding is the mar file should have captured this information and when running the torchserve --start --model-store model_store --models my_tc=TranslationClassifier.mar in a different environment, it should run out of the box and not failure to recognize the custom handler …

Issue Analytics

State:
Created 3 years ago
Reactions:9
Comments:15

Top GitHub Comments

1reaction

lxningcommented, Aug 25, 2021

@ayushch3 @kqhuynguyen pls add install_py_dep_per_model=true in config.properties if your model needs install package. And then copy or attach the config.properties to your docker container.

1reaction

ayushch3commented, Mar 16, 2021

@dhanainme Just to clarify, all these issues only occur when trying to run the torchserve inside a docker image, there are no issues when running in a standalone ubuntu system. The docker image is necessary to deploy this as a microservice, but the torchserve just fails without emiting any failure logs, so its hard to debug what’s going wrong

Top Results From Across the Web

Cannot Mount Local Directory to torchserve docker container

By default, the Docker torchserve container starts with /home/model-server as the root folder. So you are better off mounting local volumes ...

FAQ'S — PyTorch/Serve master documentation

The cmd-line utility torch-model-archiver is used to create a mar file. How can create mar file using Torchserve docker container? Yes, you create...

Optimize your inference jobs using dynamic batch ... - AWS

For offline applications, you can use SageMaker batch transform jobs. ... The model artifact in this case can be a TorchServe .mar file...

Image Layer Details - pytorch/torchserve:0.5.2-cpu | Docker Hub

pytorch/torchserve:0.5.2-cpu ... ADD file ... in /. 25.47 MB ... RUN |4 BASE_IMAGE=ubuntu:18.04 BRANCH_NAME=release_0.5.2 MACHINE_TYPE=cpu. 481.03 MB.

How to Serve PyTorch Models with TorchServe - YouTube

Hamid Shojanazeri is a Partner Engineer at PyTorch, here to demonstrate the basics of using TorchServe. As the preferred model serving ...