question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Torchserve docker worker died error

See original GitHub issue

Hi, I am trying to deploy a trained Transformers BertForSequenceClassification model using torchserve. When starting the server I keep getting “Worker died” error and the model isn’t getting loaded. I have tried following all examples of torchserve I could find to the letter, but still the model will not load, no matter what changes I make. The problem is I don’t even know how to debug torchserve and find out what causes it to fail, so I would really appreciate if someone from the awesome community here could take a look and maybe immediately notice what is the problem.

I create the .mar file using the following command: torch-model-archiver --model-name DocTag --version 1.0 --serialized-file Model/pytorch_model.bin --handler ./handler.py --extra-files "Model/config.json,./index_to_name.json" --export-path ./model_store -f

And use this command to start the server: torchserve --start --model-store model_store --models DocTag=DocTag.mar

Here is my custom handler that I wrote:

import os
import logging
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from ts.torch_handler.base_handler import BaseHandler
from ts.utils.util import load_label_mapping, map_class_to_label


MAX_SEQ_LENGTH = 256

logger = logging.getLogger(__name__)


class InputFeatures(object):
    def __init__(self, input_ids, input_mask, segment_ids):
        self.input_ids = input_ids
        self.input_mask = input_mask
        self.segment_ids = segment_ids


class DocumentTaggerHandler(BaseHandler):
    def __init__(self):
        super(DocumentTaggerHandler, self).__init__()
        self.tokenizer = None
        self.initialized = False

    def initialize(self, context):
        self.manifest = context.manifest
        properties = context.system_properties
        model_dir = properties.get("model_dir")
        self.map_location = 'cuda' if torch.cuda.is_available() else 'cpu'
        self.device = torch.device(self.map_location + ":" + str(properties.get("gpu_id"))
                                   if torch.cuda.is_available() else self.map_location)
        self.model = AutoModelForSequenceClassification.from_pretrained(model_dir)
        self.tokenizer = AutoTokenizer.from_pretrained(model_dir)
        self.model.to(self.device)
        self.model.eval()
        logger.debug('Transformer model from path {0} loaded successfully'.format(model_dir))
        mapping_file_path = os.path.join(model_dir, "index_to_name.json")
        self.mapping = load_label_mapping(mapping_file_path)
        self.initialized = True

    def _preprocess_one_document(self, req):
        text = req.get('data')
        if text is None:
            text = req.get('body')
        logger.info("Received text: '%s'", text)
        tokens = self.tokenizer(text, max_length=MAX_SEQ_LENGTH, padding='max_length', truncation=True,
                                add_special_tokens=True, return_tensors='pt')
        input_ids = tokens['input_ids']
        segment_ids = tokens['token_type_ids']
        input_mask = tokens['attention_mask']
        assert len(input_ids[0]) == MAX_SEQ_LENGTH
        assert len(input_mask[0]) == MAX_SEQ_LENGTH
        assert len(segment_ids[0]) == MAX_SEQ_LENGTH
        return InputFeatures(input_ids=input_ids, input_mask=input_mask, segment_ids=segment_ids)

    def preprocess(self, data):
        documents = [self._preprocess_one_document(req) for req in data]
        input_ids = torch.cat([f.input_ids for f in documents]).to(self.device)
        input_mask = torch.cat([f.input_mask for f in documents]).to(self.device)
        segment_ids = torch.cat([f.segment_ids for f in documents]).to(self.device)
        data = {
            'input_ids': input_ids,
            'input_mask': input_mask,
            'segment_ids': segment_ids
        }
        return data

    def inference(self, data, *args, **kwargs):
        logits = self.model(data['input_ids'], data['input_mask'], data['segment_ids'])[0]
        print("This the output size from the Seq classification model", logits[0].size())
        print("This the output from the Seq classification model", logits)
        predicted_labels = []
        predicted_labels.extend(torch.sigmoid(logits).round().long().cpu().detach().numpy())
        return predicted_labels

    def postprocess(self, data):
        res = []
        labels = map_class_to_label(data, mapping=self.mapping)
        for i in range(len(labels)):
            tags = [label[0] for label in labels[i].items() if label[1] > 0]
            res.append({'label': tags, 'index': i})
        return res
  • torchserve version: 0.2.0
  • torch version: 1.5.0
  • torchvision version [if any]: 0.6.0
  • torchtext version [if any]: 0.6.0
  • transformers version: 3.4.0
  • java version: JDK 11.0.9
  • Operating System: Windows 10

Your Environment

  • Are you planning to deploy it using docker container? [yes/no]: yes but not right now
  • Is it a CPU or GPU environment?: GPU
  • Using a default/custom handler? [If possible upload/share custom handler/model]: Custom Handler
  • What kind of model is it e.g. vision, text, audio?: Text
  • Are you planning to use local models from model-store or public url being used e.g. from S3 bucket etc.? [If public url then provide link.]: Using a local model

Failure Logs [if any]

Attached is ts_log:

2020-11-26 16:08:12,259 [INFO ] main org.pytorch.serve.ModelServer - 
Torchserve version: 0.2.0
TS Home: D:\Programming\anaconda3\envs\testenv\Lib\site-packages
Current directory: D:\Programming\JetBrains\PycharmProjects\DocumentTagger
Temp directory: C:\Users\6E8C~1\AppData\Local\Temp
Number of GPUs: 1
Number of CPUs: 8
Max heap size: 4060 M
Python executable: d:\programming\anaconda3\envs\testenv\python.exe
Config file: N/A
Inference address: http://127.0.0.1:8080
Management address: http://127.0.0.1:8081
Metrics address: http://127.0.0.1:8082
Model Store: D:\Programming\JetBrains\PycharmProjects\DocumentTagger\model_store
Initial Models: DocTag=DocTag.mar
Log dir: D:\Programming\JetBrains\PycharmProjects\DocumentTagger\logs
Metrics dir: D:\Programming\JetBrains\PycharmProjects\DocumentTagger\logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 1
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Prefer direct buffer: false
Custom python dependency for model allowed: false
Metrics report format: prometheus
Enable metrics API: true
2020-11-26 16:08:12,264 [INFO ] main org.pytorch.serve.ModelServer - Loading initial models: DocTag.mar
2020-11-26 16:08:24,826 [INFO ] main org.pytorch.serve.archive.ModelArchive - eTag 10ea22dd648e4ff897f47f15887ddaf3
2020-11-26 16:08:24,869 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model DocTag
2020-11-26 16:08:24,870 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model DocTag
2020-11-26 16:08:24,870 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model DocTag loaded.
2020-11-26 16:08:24,871 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: DocTag, count: 1
2020-11-26 16:08:24,898 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: NioServerSocketChannel.
2020-11-26 16:08:25,050 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Listening on port: None
2020-11-26 16:08:25,052 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - [PID]3896
2020-11-26 16:08:25,052 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Torch worker started.
2020-11-26 16:08:25,052 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Python runtime: 3.7.6
2020-11-26 16:08:25,052 [DEBUG] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-DocTag_1.0 State change null -> WORKER_STARTED
2020-11-26 16:08:25,057 [INFO ] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /127.0.0.1:9000
2020-11-26 16:08:25,202 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://127.0.0.1:8080
2020-11-26 16:08:25,202 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: NioServerSocketChannel.
2020-11-26 16:08:25,204 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://127.0.0.1:8081
2020-11-26 16:08:25,205 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: NioServerSocketChannel.
2020-11-26 16:08:25,207 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://127.0.0.1:8082
2020-11-26 16:08:25,211 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Connection accepted: ('127.0.0.1', 9000).
2020-11-26 16:08:36,399 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Backend worker process died.
2020-11-26 16:08:36,400 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Traceback (most recent call last):
2020-11-26 16:08:36,400 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "D:\Programming\anaconda3\envs\testenv\Lib\site-packages\ts\model_service_worker.py", line 176, in <module>
2020-11-26 16:08:36,400 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     worker.run_server()
2020-11-26 16:08:36,401 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "D:\Programming\anaconda3\envs\testenv\Lib\site-packages\ts\model_service_worker.py", line 148, in run_server
2020-11-26 16:08:36,401 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     self.handle_connection(cl_socket)
2020-11-26 16:08:36,401 [INFO ] nioEventLoopGroup-5-1 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_STARTED
2020-11-26 16:08:36,402 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "D:\Programming\anaconda3\envs\testenv\Lib\site-packages\ts\model_service_worker.py", line 112, in handle_connection
2020-11-26 16:08:36,402 [DEBUG] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - System state is : WORKER_STARTED
2020-11-26 16:08:36,402 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     service, result, code = self.load_model(msg)
2020-11-26 16:08:36,403 [DEBUG] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died.
java.lang.InterruptedException
	at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2056)
	at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2133)
	at java.base/java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:432)
	at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:129)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
2020-11-26 16:08:36,403 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "D:\Programming\anaconda3\envs\testenv\Lib\site-packages\ts\model_service_worker.py", line 85, in load_model
2020-11-26 16:08:36,406 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     service = model_loader.load(model_name, model_dir, handler, gpu, batch_size)
2020-11-26 16:08:36,406 [WARN ] W-9000-DocTag_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: DocTag, error: Worker died.
2020-11-26 16:08:36,414 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "d:\programming\anaconda3\envs\testenv\lib\site-packages\ts\model_loader.py", line 106, in load
2020-11-26 16:08:36,414 [DEBUG] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-DocTag_1.0 State change WORKER_STARTED -> WORKER_STOPPED
2020-11-26 16:08:36,414 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     model_class_definitions))
2020-11-26 16:08:36,415 [WARN ] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-DocTag_1.0-stderr
2020-11-26 16:08:36,416 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - ValueError: Expected only one class in custom service code or a function entry point [<class 'handler.DocumentTaggerHandler'>, <class 'handler.InputFeatures'>]
2020-11-26 16:08:36,417 [WARN ] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-DocTag_1.0-stdout
2020-11-26 16:08:36,418 [INFO ] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9000 in 1 seconds.
2020-11-26 16:08:36,423 [INFO ] W-9000-DocTag_1.0-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-DocTag_1.0-stderr
2020-11-26 16:08:36,423 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-DocTag_1.0-stdout
2020-11-26 16:08:37,635 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Listening on port: None
2020-11-26 16:08:37,636 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - [PID]17128
2020-11-26 16:08:37,636 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Torch worker started.
2020-11-26 16:08:37,636 [DEBUG] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-DocTag_1.0 State change WORKER_STOPPED -> WORKER_STARTED
2020-11-26 16:08:37,636 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Python runtime: 3.7.6
2020-11-26 16:08:37,637 [INFO ] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /127.0.0.1:9000
2020-11-26 16:08:37,640 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Connection accepted: ('127.0.0.1', 9000).
2020-11-26 16:08:40,574 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Backend worker process died.
2020-11-26 16:08:40,574 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Traceback (most recent call last):
2020-11-26 16:08:40,575 [INFO ] nioEventLoopGroup-5-2 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_STARTED
2020-11-26 16:08:40,575 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "D:\Programming\anaconda3\envs\testenv\Lib\site-packages\ts\model_service_worker.py", line 176, in <module>
2020-11-26 16:08:40,576 [DEBUG] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - System state is : WORKER_STARTED
2020-11-26 16:08:40,576 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     worker.run_server()
2020-11-26 16:08:40,576 [DEBUG] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died.
java.lang.InterruptedException
	at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2056)
	at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2133)
	at java.base/java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:432)
	at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:129)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
2020-11-26 16:08:40,577 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "D:\Programming\anaconda3\envs\testenv\Lib\site-packages\ts\model_service_worker.py", line 148, in run_server
2020-11-26 16:08:40,577 [WARN ] W-9000-DocTag_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: DocTag, error: Worker died.
2020-11-26 16:08:40,577 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     self.handle_connection(cl_socket)
2020-11-26 16:08:40,578 [DEBUG] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-DocTag_1.0 State change WORKER_STARTED -> WORKER_STOPPED
2020-11-26 16:08:40,578 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "D:\Programming\anaconda3\envs\testenv\Lib\site-packages\ts\model_service_worker.py", line 112, in handle_connection
2020-11-26 16:08:40,578 [WARN ] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-DocTag_1.0-stderr
2020-11-26 16:08:40,580 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     service, result, code = self.load_model(msg)
2020-11-26 16:08:40,581 [WARN ] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-DocTag_1.0-stdout
2020-11-26 16:08:40,581 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "D:\Programming\anaconda3\envs\testenv\Lib\site-packages\ts\model_service_worker.py", line 85, in load_model
2020-11-26 16:08:40,581 [INFO ] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9000 in 1 seconds.
2020-11-26 16:08:40,582 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-DocTag_1.0-stdout
2020-11-26 16:08:40,587 [INFO ] W-9000-DocTag_1.0-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-DocTag_1.0-stderr
2020-11-26 16:08:41,765 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Listening on port: None
2020-11-26 16:08:41,766 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - [PID]8792
2020-11-26 16:08:41,767 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Torch worker started.
2020-11-26 16:08:41,767 [DEBUG] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-DocTag_1.0 State change WORKER_STOPPED -> WORKER_STARTED
2020-11-26 16:08:41,767 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Python runtime: 3.7.6
2020-11-26 16:08:41,767 [INFO ] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /127.0.0.1:9000
2020-11-26 16:08:41,771 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Connection accepted: ('127.0.0.1', 9000).
2020-11-26 16:08:44,682 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Backend worker process died.
2020-11-26 16:08:44,682 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Traceback (most recent call last):
2020-11-26 16:08:44,683 [INFO ] nioEventLoopGroup-5-3 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_STARTED
2020-11-26 16:08:44,683 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "D:\Programming\anaconda3\envs\testenv\Lib\site-packages\ts\model_service_worker.py", line 176, in <module>
2020-11-26 16:08:44,684 [DEBUG] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - System state is : WORKER_STARTED
2020-11-26 16:08:44,684 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     worker.run_server()
2020-11-26 16:08:44,684 [DEBUG] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died.
java.lang.InterruptedException
	at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2056)
	at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2133)
	at java.base/java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:432)
	at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:129)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
2020-11-26 16:08:44,685 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "D:\Programming\anaconda3\envs\testenv\Lib\site-packages\ts\model_service_worker.py", line 148, in run_server
2020-11-26 16:08:44,685 [WARN ] W-9000-DocTag_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: DocTag, error: Worker died.
2020-11-26 16:08:44,685 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     self.handle_connection(cl_socket)
2020-11-26 16:08:44,686 [DEBUG] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-DocTag_1.0 State change WORKER_STARTED -> WORKER_STOPPED
2020-11-26 16:08:44,686 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "D:\Programming\anaconda3\envs\testenv\Lib\site-packages\ts\model_service_worker.py", line 112, in handle_connection
2020-11-26 16:08:44,686 [WARN ] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-DocTag_1.0-stderr
2020-11-26 16:08:44,687 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     service, result, code = self.load_model(msg)
2020-11-26 16:08:44,688 [WARN ] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-DocTag_1.0-stdout
2020-11-26 16:08:44,688 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "D:\Programming\anaconda3\envs\testenv\Lib\site-packages\ts\model_service_worker.py", line 85, in load_model
2020-11-26 16:08:44,689 [INFO ] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9000 in 2 seconds.
2020-11-26 16:08:44,689 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-DocTag_1.0-stdout
2020-11-26 16:08:44,694 [INFO ] W-9000-DocTag_1.0-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-DocTag_1.0-stderr
2020-11-26 16:08:46,156 [INFO ] nioEventLoopGroup-2-1 org.pytorch.serve.ModelServer - Inference model server stopped.
2020-11-26 16:08:46,156 [INFO ] nioEventLoopGroup-2-2 org.pytorch.serve.ModelServer - Management model server stopped.
2020-11-26 16:08:46,156 [INFO ] nioEventLoopGroup-2-1 org.pytorch.serve.ModelServer - Metrics model server stopped.
2020-11-26 16:08:46,842 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Listening on port: None
2020-11-26 16:08:46,843 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - [PID]14256
2020-11-26 16:08:46,843 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Torch worker started.
2020-11-26 16:08:46,843 [DEBUG] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-DocTag_1.0 State change WORKER_STOPPED -> WORKER_STARTED
2020-11-26 16:08:46,843 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Python runtime: 3.7.6
2020-11-26 16:08:46,844 [INFO ] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /127.0.0.1:9000
2020-11-26 16:08:46,846 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Connection accepted: ('127.0.0.1', 9000).
2020-11-26 16:08:48,256 [INFO ] main org.pytorch.serve.ModelServer - Torchserve stopped.

I hope this is all enough info for you to be able to assist me here, as I’ve really hit a wall with this issue. If any additional details are required please let me know!

Thanks!

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
OmriPicommented, Nov 26, 2020

@harshbafna Thank you!

I have moved the InputFeatures class into a separate file and included it in the --extra-files flag. After that I got a new error saying the tokenizer’s “vocab.txt” file is missing so went ahead and added the tokenizer configuration files. Works like a charm now! Thanks so much!

0reactions
OmriPicommented, Dec 1, 2020

Based on the following, looks like transformers package is missing in docker image- No module named 'transformers'

@dhaniram-kshirsagar That is strange, I thought all required files are included in the archived model… I am using the official torchserve docker image, does it make sense that I have to edit it for every package I want to include? I’m not a docker pro, so I thought it should work out of the box with the official container…

Read more comments on GitHub >

github_iconTop Results From Across the Web

Object Detection worker dies #281 - pytorch/serve - GitHub
I am deploying a FasterRCNN model with 2 workers, and when I POST an image to the inference API as follows: curl -X...
Read more >
python - loading model failed in torchserving - Stack Overflow
i am using the model from kaggle. I presume you got the model from https://www.kaggle.com/pytorch/vgg16. I think you are loading the model ...
Read more >
Torch serve handler doesnt not load state_dict in Docker
I have a saved model which i am trying to serve it with Docker. ... W-9010-ponzi_v20211122-stdout MODEL_LOG - Backend worker process died.
Read more >
TorchServe部署Yolov5翻车录 - 知乎专栏
我打包成功后,部署出现Load model failed: yolo_test, error: Worker died.感觉还是handler那里出问题了,不知道是否可以交流一下? 2021-07-27. ​ 回复.
Read more >
TorchServe(v0.3.0)を使った感想 - Zenn
READMEにある『Quick start with docker: Start a container with a ... "Worker died. ... OMP: Error #15: Initializing libiomp5md.dll, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found