load_models all loading the same model 10 times before going unresponsive
See original GitHub issue🐛 Describe the bug
This is a continuation of #1779 as that discussion took a different route.
Error logs
Logs too long, moved to pastebin
Installation instructions
docker run --rm -it -p 8085:8085 -v $(pwd):/home/model-server/ pytorch/torchserve bash
Model Packaing
Packaged using model archiver 0.6 Model Handler
from transformers import AutoConfig, AutoTokenizer, AutoModelForSequenceClassification
import anltk
import unicodedata
import torch
# Guide dependencies
from abc import ABC
import logging
import torch
from ts.torch_handler.base_handler import BaseHandler
logger = logging.getLogger(__name__)
class TopicClassifier(BaseHandler, ABC):
def __init__(self,):
super().__init__()
self.initialized = False
def initialize(self, ctx):
logger.log(logging.INFO, "=============INITIALIZING TOPIC CLASSIFIER=============")
self.manifest = ctx.manifest
properties = ctx.system_properties
model_path = properties.get("model_dir")
self.device = torch.device("cuda:" + str(properties.get("gpu_id")) if torch.cuda.is_available() else "cpu")
self.config = AutoConfig.from_pretrained(model_path)
self.tokenizer = AutoTokenizer.from_pretrained(model_path, config=self.config)
self.model = AutoModelForSequenceClassification.from_pretrained(model_path, config=self.config)
self.model.to(self.device)
self.model.eval()
# Optional Mapping File
self.labels = list(self.config.label2id.keys())
logger.log(logging.INFO, f"Initialized Topic Classifier")
self.initialized = True
def preprocess_(self, query: str) -> str:
query_ = unicodedata.normalize('NFC', query)
query_ = ' '.join(anltk.tokenize_words(query_))
query_ = anltk.remove_non_alpha(query_, stop_list=' ?,:".')
query_ = anltk.fold_white_spaces(query_)
query_ = query_.strip()
return query_
def preprocess(self, data):
logger.log(logging.INFO, f"Preprocessing started")
logger.log(logging.INFO, f"data is {data}")
query = data[0]
query = query.get("body", {"text": query.get("text", "")}).get("text", "")
if not query.strip():
raise Exception("No text found in query")
query = self.preprocess_(query)
logger.log(logging.INFO, f"query is {query}")
query_ = self.preprocess_(query)
# tokens = self.tokenizer.tokenize(query_) # Debugging only
encoded_dict = self.tokenizer.encode_plus(
query_, # Sentence to encode.
add_special_tokens = True, # Add '[CLS]' and '[SEP]'
max_length=512, # Pad & truncate all sentences.
truncation=True,
padding='max_length', # Padding strategy
pad_to_max_length = True,
return_attention_mask = True, # Construct attn. masks.
return_tensors = 'pt', # Return pytorch tensors.
)
# return encoded_dict, tokens, query_
return encoded_dict, query_
def inference(self, inputs,):
logger.log(logging.INFO, f"Inference started")
with torch.no_grad():
for key in inputs: # Convert all to device first
try:
inputs[key] = inputs[key].to(self.device)
except:
pass
outputs = self.model(**inputs)
predictions = torch.nn.functional.softmax(outputs[0].squeeze(), dim=0)
pred = torch.argmax(predictions, dim=0)
correct = self.labels[pred.item()]
logger.log(logging.INFO, f"Predicted: {correct}")
class_dict = {}
labeled_dict = {"Correct": correct, "Classes": class_dict}
for label in self.labels:
class_dict[label] = "{:.3f}".format(predictions[self.config.label2id[label]].item())
return labeled_dict
def postprocess(self, data: dict, query):
# data["Preprocessed"] = query # No Need
return [data] # Return the data as is but in a list
_service = TopicClassifier()
def handle(data, context):
try:
if not _service.initialized:
_service.initialize(context)
if data is None:
return None
logger.log(logging.INFO, f"Received data: {data}")
inputs, query = _service.preprocess(data)
output_dict = _service.inference(inputs)
outputs = _service.postprocess(output_dict, query)
return outputs
except Exception as e:
logger.error(e)
raise e
config.properties
inference_address=http://0.0.0.0:8085
management_address=http://0.0.0.0:8085
metrics_address=http://0.0.0.0:8082
enable_metrics_api=true
metrics_format=prometheus
number_of_netty_threads=4
job_queue_size=10
model_store=/mnt/models/model-store
models={"newmodel":{"1.0":{"defaultVersion":true,"marName":"testmodel.mar","minWorkers":1,"maxWorkers":5,"batchSize":1,"maxBatchDelay":5000,"responseTimeout":120}}}
load_models=all
install_py_dep_per_model=true
Versions
TorchServe Version is 0.6.0
Repro instructions
Run the above config with the above handler
Possible Solution
Create model_snapshot
and use that instead of load_models
Issue Analytics
- State:
- Created a year ago
- Comments:8
Top Results From Across the Web
Save and load models | TensorFlow Core
Model progress can be saved during and after training. This means a model can resume where it left off and avoid long training...
Read more >How to Save and Load Your Keras Deep Learning Model
In this post, you will look at three examples of saving and loading your model to a file: Save Model to JSON; Save...
Read more >Extremely slow model load with keras - python - Stack Overflow
I solved the problem by clearing the keras session before each load from keras import backend as K for i in range(.
Read more >tf.loadModel not working in ionic · Issue #272 · tensorflow/tfjs
loadModel not working, it fails to load model from local folder ( ie. assets/model ) however, web version is working, when it runs...
Read more >A quick complete tutorial to save and restore Tensorflow models
How to restore a Tensorflow model for prediction/transfer learning? ... you want to keep only 4 latest models and want to save one...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@maaquib I will try to replicate your changes tomorrow to see how they go with me. But from what I’ve seen are the following differences between us:
--ncs
where I didn’t.models all
where I’m doing that in config fileI will let you know how it goes once I try it out. Thanks for your support.
@maaquib it’s a huggingface model so it’s referenced by name:
CAMeL-Lab/bert-base-arabic-camelbert-msa-sixteenth
Let me know if you need any more info. So far I’ve ran this on different machines (and on a Kubeflow cluster) and all of them produce the
INITIALIZING TOPIC CLASSIFIER
more than 20 times, which is a log I have placed in theinitialize
function of the handler as seen above.Necessary Edit: The model above is actually the base model, I am then fine-tuning it using HuggingFace topic classifier on exactly 3 classes,
Topic1 - Topic2- Topic3
. If it were to come to it I can send you the checkpoints and the required files to build a MAR out of it, since the data used is open source and the training is just POC and nothing copyrighted/licensed yet.