Torchserve docker worker died error
See original GitHub issueHi, I am trying to deploy a trained Transformers BertForSequenceClassification model using torchserve. When starting the server I keep getting “Worker died” error and the model isn’t getting loaded. I have tried following all examples of torchserve I could find to the letter, but still the model will not load, no matter what changes I make. The problem is I don’t even know how to debug torchserve and find out what causes it to fail, so I would really appreciate if someone from the awesome community here could take a look and maybe immediately notice what is the problem.
I create the .mar file using the following command:
torch-model-archiver --model-name DocTag --version 1.0 --serialized-file Model/pytorch_model.bin --handler ./handler.py --extra-files "Model/config.json,./index_to_name.json" --export-path ./model_store -f
And use this command to start the server:
torchserve --start --model-store model_store --models DocTag=DocTag.mar
Here is my custom handler that I wrote:
import os
import logging
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from ts.torch_handler.base_handler import BaseHandler
from ts.utils.util import load_label_mapping, map_class_to_label
MAX_SEQ_LENGTH = 256
logger = logging.getLogger(__name__)
class InputFeatures(object):
def __init__(self, input_ids, input_mask, segment_ids):
self.input_ids = input_ids
self.input_mask = input_mask
self.segment_ids = segment_ids
class DocumentTaggerHandler(BaseHandler):
def __init__(self):
super(DocumentTaggerHandler, self).__init__()
self.tokenizer = None
self.initialized = False
def initialize(self, context):
self.manifest = context.manifest
properties = context.system_properties
model_dir = properties.get("model_dir")
self.map_location = 'cuda' if torch.cuda.is_available() else 'cpu'
self.device = torch.device(self.map_location + ":" + str(properties.get("gpu_id"))
if torch.cuda.is_available() else self.map_location)
self.model = AutoModelForSequenceClassification.from_pretrained(model_dir)
self.tokenizer = AutoTokenizer.from_pretrained(model_dir)
self.model.to(self.device)
self.model.eval()
logger.debug('Transformer model from path {0} loaded successfully'.format(model_dir))
mapping_file_path = os.path.join(model_dir, "index_to_name.json")
self.mapping = load_label_mapping(mapping_file_path)
self.initialized = True
def _preprocess_one_document(self, req):
text = req.get('data')
if text is None:
text = req.get('body')
logger.info("Received text: '%s'", text)
tokens = self.tokenizer(text, max_length=MAX_SEQ_LENGTH, padding='max_length', truncation=True,
add_special_tokens=True, return_tensors='pt')
input_ids = tokens['input_ids']
segment_ids = tokens['token_type_ids']
input_mask = tokens['attention_mask']
assert len(input_ids[0]) == MAX_SEQ_LENGTH
assert len(input_mask[0]) == MAX_SEQ_LENGTH
assert len(segment_ids[0]) == MAX_SEQ_LENGTH
return InputFeatures(input_ids=input_ids, input_mask=input_mask, segment_ids=segment_ids)
def preprocess(self, data):
documents = [self._preprocess_one_document(req) for req in data]
input_ids = torch.cat([f.input_ids for f in documents]).to(self.device)
input_mask = torch.cat([f.input_mask for f in documents]).to(self.device)
segment_ids = torch.cat([f.segment_ids for f in documents]).to(self.device)
data = {
'input_ids': input_ids,
'input_mask': input_mask,
'segment_ids': segment_ids
}
return data
def inference(self, data, *args, **kwargs):
logits = self.model(data['input_ids'], data['input_mask'], data['segment_ids'])[0]
print("This the output size from the Seq classification model", logits[0].size())
print("This the output from the Seq classification model", logits)
predicted_labels = []
predicted_labels.extend(torch.sigmoid(logits).round().long().cpu().detach().numpy())
return predicted_labels
def postprocess(self, data):
res = []
labels = map_class_to_label(data, mapping=self.mapping)
for i in range(len(labels)):
tags = [label[0] for label in labels[i].items() if label[1] > 0]
res.append({'label': tags, 'index': i})
return res
- torchserve version: 0.2.0
- torch version: 1.5.0
- torchvision version [if any]: 0.6.0
- torchtext version [if any]: 0.6.0
- transformers version: 3.4.0
- java version: JDK 11.0.9
- Operating System: Windows 10
Your Environment
- Are you planning to deploy it using docker container? [yes/no]: yes but not right now
- Is it a CPU or GPU environment?: GPU
- Using a default/custom handler? [If possible upload/share custom handler/model]: Custom Handler
- What kind of model is it e.g. vision, text, audio?: Text
- Are you planning to use local models from model-store or public url being used e.g. from S3 bucket etc.? [If public url then provide link.]: Using a local model
Failure Logs [if any]
Attached is ts_log:
2020-11-26 16:08:12,259 [INFO ] main org.pytorch.serve.ModelServer -
Torchserve version: 0.2.0
TS Home: D:\Programming\anaconda3\envs\testenv\Lib\site-packages
Current directory: D:\Programming\JetBrains\PycharmProjects\DocumentTagger
Temp directory: C:\Users\6E8C~1\AppData\Local\Temp
Number of GPUs: 1
Number of CPUs: 8
Max heap size: 4060 M
Python executable: d:\programming\anaconda3\envs\testenv\python.exe
Config file: N/A
Inference address: http://127.0.0.1:8080
Management address: http://127.0.0.1:8081
Metrics address: http://127.0.0.1:8082
Model Store: D:\Programming\JetBrains\PycharmProjects\DocumentTagger\model_store
Initial Models: DocTag=DocTag.mar
Log dir: D:\Programming\JetBrains\PycharmProjects\DocumentTagger\logs
Metrics dir: D:\Programming\JetBrains\PycharmProjects\DocumentTagger\logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 1
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Prefer direct buffer: false
Custom python dependency for model allowed: false
Metrics report format: prometheus
Enable metrics API: true
2020-11-26 16:08:12,264 [INFO ] main org.pytorch.serve.ModelServer - Loading initial models: DocTag.mar
2020-11-26 16:08:24,826 [INFO ] main org.pytorch.serve.archive.ModelArchive - eTag 10ea22dd648e4ff897f47f15887ddaf3
2020-11-26 16:08:24,869 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model DocTag
2020-11-26 16:08:24,870 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model DocTag
2020-11-26 16:08:24,870 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model DocTag loaded.
2020-11-26 16:08:24,871 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: DocTag, count: 1
2020-11-26 16:08:24,898 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: NioServerSocketChannel.
2020-11-26 16:08:25,050 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Listening on port: None
2020-11-26 16:08:25,052 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - [PID]3896
2020-11-26 16:08:25,052 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Torch worker started.
2020-11-26 16:08:25,052 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Python runtime: 3.7.6
2020-11-26 16:08:25,052 [DEBUG] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-DocTag_1.0 State change null -> WORKER_STARTED
2020-11-26 16:08:25,057 [INFO ] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /127.0.0.1:9000
2020-11-26 16:08:25,202 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://127.0.0.1:8080
2020-11-26 16:08:25,202 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: NioServerSocketChannel.
2020-11-26 16:08:25,204 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://127.0.0.1:8081
2020-11-26 16:08:25,205 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: NioServerSocketChannel.
2020-11-26 16:08:25,207 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://127.0.0.1:8082
2020-11-26 16:08:25,211 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Connection accepted: ('127.0.0.1', 9000).
2020-11-26 16:08:36,399 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Backend worker process died.
2020-11-26 16:08:36,400 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Traceback (most recent call last):
2020-11-26 16:08:36,400 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File "D:\Programming\anaconda3\envs\testenv\Lib\site-packages\ts\model_service_worker.py", line 176, in <module>
2020-11-26 16:08:36,400 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - worker.run_server()
2020-11-26 16:08:36,401 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File "D:\Programming\anaconda3\envs\testenv\Lib\site-packages\ts\model_service_worker.py", line 148, in run_server
2020-11-26 16:08:36,401 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - self.handle_connection(cl_socket)
2020-11-26 16:08:36,401 [INFO ] nioEventLoopGroup-5-1 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_STARTED
2020-11-26 16:08:36,402 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File "D:\Programming\anaconda3\envs\testenv\Lib\site-packages\ts\model_service_worker.py", line 112, in handle_connection
2020-11-26 16:08:36,402 [DEBUG] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - System state is : WORKER_STARTED
2020-11-26 16:08:36,402 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - service, result, code = self.load_model(msg)
2020-11-26 16:08:36,403 [DEBUG] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died.
java.lang.InterruptedException
at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2056)
at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2133)
at java.base/java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:432)
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:129)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
2020-11-26 16:08:36,403 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File "D:\Programming\anaconda3\envs\testenv\Lib\site-packages\ts\model_service_worker.py", line 85, in load_model
2020-11-26 16:08:36,406 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - service = model_loader.load(model_name, model_dir, handler, gpu, batch_size)
2020-11-26 16:08:36,406 [WARN ] W-9000-DocTag_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: DocTag, error: Worker died.
2020-11-26 16:08:36,414 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File "d:\programming\anaconda3\envs\testenv\lib\site-packages\ts\model_loader.py", line 106, in load
2020-11-26 16:08:36,414 [DEBUG] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-DocTag_1.0 State change WORKER_STARTED -> WORKER_STOPPED
2020-11-26 16:08:36,414 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - model_class_definitions))
2020-11-26 16:08:36,415 [WARN ] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-DocTag_1.0-stderr
2020-11-26 16:08:36,416 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - ValueError: Expected only one class in custom service code or a function entry point [<class 'handler.DocumentTaggerHandler'>, <class 'handler.InputFeatures'>]
2020-11-26 16:08:36,417 [WARN ] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-DocTag_1.0-stdout
2020-11-26 16:08:36,418 [INFO ] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9000 in 1 seconds.
2020-11-26 16:08:36,423 [INFO ] W-9000-DocTag_1.0-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-DocTag_1.0-stderr
2020-11-26 16:08:36,423 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-DocTag_1.0-stdout
2020-11-26 16:08:37,635 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Listening on port: None
2020-11-26 16:08:37,636 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - [PID]17128
2020-11-26 16:08:37,636 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Torch worker started.
2020-11-26 16:08:37,636 [DEBUG] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-DocTag_1.0 State change WORKER_STOPPED -> WORKER_STARTED
2020-11-26 16:08:37,636 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Python runtime: 3.7.6
2020-11-26 16:08:37,637 [INFO ] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /127.0.0.1:9000
2020-11-26 16:08:37,640 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Connection accepted: ('127.0.0.1', 9000).
2020-11-26 16:08:40,574 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Backend worker process died.
2020-11-26 16:08:40,574 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Traceback (most recent call last):
2020-11-26 16:08:40,575 [INFO ] nioEventLoopGroup-5-2 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_STARTED
2020-11-26 16:08:40,575 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File "D:\Programming\anaconda3\envs\testenv\Lib\site-packages\ts\model_service_worker.py", line 176, in <module>
2020-11-26 16:08:40,576 [DEBUG] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - System state is : WORKER_STARTED
2020-11-26 16:08:40,576 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - worker.run_server()
2020-11-26 16:08:40,576 [DEBUG] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died.
java.lang.InterruptedException
at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2056)
at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2133)
at java.base/java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:432)
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:129)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
2020-11-26 16:08:40,577 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File "D:\Programming\anaconda3\envs\testenv\Lib\site-packages\ts\model_service_worker.py", line 148, in run_server
2020-11-26 16:08:40,577 [WARN ] W-9000-DocTag_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: DocTag, error: Worker died.
2020-11-26 16:08:40,577 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - self.handle_connection(cl_socket)
2020-11-26 16:08:40,578 [DEBUG] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-DocTag_1.0 State change WORKER_STARTED -> WORKER_STOPPED
2020-11-26 16:08:40,578 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File "D:\Programming\anaconda3\envs\testenv\Lib\site-packages\ts\model_service_worker.py", line 112, in handle_connection
2020-11-26 16:08:40,578 [WARN ] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-DocTag_1.0-stderr
2020-11-26 16:08:40,580 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - service, result, code = self.load_model(msg)
2020-11-26 16:08:40,581 [WARN ] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-DocTag_1.0-stdout
2020-11-26 16:08:40,581 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File "D:\Programming\anaconda3\envs\testenv\Lib\site-packages\ts\model_service_worker.py", line 85, in load_model
2020-11-26 16:08:40,581 [INFO ] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9000 in 1 seconds.
2020-11-26 16:08:40,582 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-DocTag_1.0-stdout
2020-11-26 16:08:40,587 [INFO ] W-9000-DocTag_1.0-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-DocTag_1.0-stderr
2020-11-26 16:08:41,765 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Listening on port: None
2020-11-26 16:08:41,766 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - [PID]8792
2020-11-26 16:08:41,767 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Torch worker started.
2020-11-26 16:08:41,767 [DEBUG] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-DocTag_1.0 State change WORKER_STOPPED -> WORKER_STARTED
2020-11-26 16:08:41,767 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Python runtime: 3.7.6
2020-11-26 16:08:41,767 [INFO ] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /127.0.0.1:9000
2020-11-26 16:08:41,771 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Connection accepted: ('127.0.0.1', 9000).
2020-11-26 16:08:44,682 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Backend worker process died.
2020-11-26 16:08:44,682 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Traceback (most recent call last):
2020-11-26 16:08:44,683 [INFO ] nioEventLoopGroup-5-3 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_STARTED
2020-11-26 16:08:44,683 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File "D:\Programming\anaconda3\envs\testenv\Lib\site-packages\ts\model_service_worker.py", line 176, in <module>
2020-11-26 16:08:44,684 [DEBUG] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - System state is : WORKER_STARTED
2020-11-26 16:08:44,684 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - worker.run_server()
2020-11-26 16:08:44,684 [DEBUG] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died.
java.lang.InterruptedException
at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2056)
at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2133)
at java.base/java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:432)
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:129)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
2020-11-26 16:08:44,685 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File "D:\Programming\anaconda3\envs\testenv\Lib\site-packages\ts\model_service_worker.py", line 148, in run_server
2020-11-26 16:08:44,685 [WARN ] W-9000-DocTag_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: DocTag, error: Worker died.
2020-11-26 16:08:44,685 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - self.handle_connection(cl_socket)
2020-11-26 16:08:44,686 [DEBUG] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-DocTag_1.0 State change WORKER_STARTED -> WORKER_STOPPED
2020-11-26 16:08:44,686 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File "D:\Programming\anaconda3\envs\testenv\Lib\site-packages\ts\model_service_worker.py", line 112, in handle_connection
2020-11-26 16:08:44,686 [WARN ] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-DocTag_1.0-stderr
2020-11-26 16:08:44,687 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - service, result, code = self.load_model(msg)
2020-11-26 16:08:44,688 [WARN ] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-DocTag_1.0-stdout
2020-11-26 16:08:44,688 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File "D:\Programming\anaconda3\envs\testenv\Lib\site-packages\ts\model_service_worker.py", line 85, in load_model
2020-11-26 16:08:44,689 [INFO ] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9000 in 2 seconds.
2020-11-26 16:08:44,689 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-DocTag_1.0-stdout
2020-11-26 16:08:44,694 [INFO ] W-9000-DocTag_1.0-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-DocTag_1.0-stderr
2020-11-26 16:08:46,156 [INFO ] nioEventLoopGroup-2-1 org.pytorch.serve.ModelServer - Inference model server stopped.
2020-11-26 16:08:46,156 [INFO ] nioEventLoopGroup-2-2 org.pytorch.serve.ModelServer - Management model server stopped.
2020-11-26 16:08:46,156 [INFO ] nioEventLoopGroup-2-1 org.pytorch.serve.ModelServer - Metrics model server stopped.
2020-11-26 16:08:46,842 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Listening on port: None
2020-11-26 16:08:46,843 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - [PID]14256
2020-11-26 16:08:46,843 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Torch worker started.
2020-11-26 16:08:46,843 [DEBUG] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-DocTag_1.0 State change WORKER_STOPPED -> WORKER_STARTED
2020-11-26 16:08:46,843 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Python runtime: 3.7.6
2020-11-26 16:08:46,844 [INFO ] W-9000-DocTag_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /127.0.0.1:9000
2020-11-26 16:08:46,846 [INFO ] W-9000-DocTag_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Connection accepted: ('127.0.0.1', 9000).
2020-11-26 16:08:48,256 [INFO ] main org.pytorch.serve.ModelServer - Torchserve stopped.
I hope this is all enough info for you to be able to assist me here, as I’ve really hit a wall with this issue. If any additional details are required please let me know!
Thanks!
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (3 by maintainers)

Top Related StackOverflow Question
@harshbafna Thank you!
I have moved the InputFeatures class into a separate file and included it in the --extra-files flag. After that I got a new error saying the tokenizer’s “vocab.txt” file is missing so went ahead and added the tokenizer configuration files. Works like a charm now! Thanks so much!
@dhaniram-kshirsagar That is strange, I thought all required files are included in the archived model… I am using the official torchserve docker image, does it make sense that I have to edit it for every package I want to include? I’m not a docker pro, so I thought it should work out of the box with the official container…