Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

load_models all loading the same model 10 times before going unresponsive

See original GitHub issue

🐛 Describe the bug

This is a continuation of #1779 as that discussion took a different route.

Error logs

Logs too long, moved to pastebin

Installation instructions

docker run --rm -it -p 8085:8085 -v $(pwd):/home/model-server/ pytorch/torchserve bash

Model Packaing

Packaged using model archiver 0.6 Model Handler

from transformers import AutoConfig, AutoTokenizer, AutoModelForSequenceClassification
import anltk
import unicodedata
import torch

# Guide dependencies
from abc import ABC
import logging
import torch
from ts.torch_handler.base_handler import BaseHandler

logger = logging.getLogger(__name__)

class TopicClassifier(BaseHandler, ABC):
    def __init__(self,):
        super().__init__()
        self.initialized = False

    def initialize(self, ctx):
        logger.log(logging.INFO, "=============INITIALIZING TOPIC CLASSIFIER=============")
        self.manifest = ctx.manifest

        properties = ctx.system_properties
        model_path = properties.get("model_dir")
        self.device = torch.device("cuda:" + str(properties.get("gpu_id")) if torch.cuda.is_available() else "cpu")

        self.config = AutoConfig.from_pretrained(model_path)
        self.tokenizer = AutoTokenizer.from_pretrained(model_path, config=self.config)
        self.model = AutoModelForSequenceClassification.from_pretrained(model_path, config=self.config)

        self.model.to(self.device)
        self.model.eval()

        # Optional Mapping File
        self.labels = list(self.config.label2id.keys())


        logger.log(logging.INFO, f"Initialized Topic Classifier")
        self.initialized = True

    def preprocess_(self, query: str) -> str:
        query_ = unicodedata.normalize('NFC', query)
        query_ = ' '.join(anltk.tokenize_words(query_))
        query_ = anltk.remove_non_alpha(query_, stop_list=' ?,:".')
        query_ = anltk.fold_white_spaces(query_)
        query_ = query_.strip()
        return query_

    def preprocess(self, data):
        logger.log(logging.INFO, f"Preprocessing started")
        logger.log(logging.INFO, f"data is {data}")
        query = data[0]
        query = query.get("body", {"text": query.get("text", "")}).get("text", "")
        if not query.strip():
            raise Exception("No text found in query")
        query = self.preprocess_(query)
        logger.log(logging.INFO, f"query is {query}")

        query_ = self.preprocess_(query)

        # tokens = self.tokenizer.tokenize(query_) # Debugging only

        encoded_dict = self.tokenizer.encode_plus(
                        query_,                      # Sentence to encode.
                        add_special_tokens = True, # Add '[CLS]' and '[SEP]'
                        max_length=512,           # Pad & truncate all sentences.
                        truncation=True,
                        padding='max_length',  # Padding strategy
                        pad_to_max_length = True,
                        return_attention_mask = True,   # Construct attn. masks.
                        return_tensors = 'pt',     # Return pytorch tensors.
                    )
        # return encoded_dict, tokens, query_
        return encoded_dict, query_
    
    def inference(self, inputs,):
        logger.log(logging.INFO, f"Inference started")
        with torch.no_grad():
            for key in inputs: # Convert all to device first
                try: 
                    inputs[key] = inputs[key].to(self.device)
                except:
                    pass
            outputs = self.model(**inputs)
        
        predictions = torch.nn.functional.softmax(outputs[0].squeeze(), dim=0)
        pred = torch.argmax(predictions, dim=0)

        correct = self.labels[pred.item()]

        logger.log(logging.INFO, f"Predicted: {correct}")

        class_dict = {}
        labeled_dict = {"Correct": correct, "Classes": class_dict}
        for label in self.labels:
            class_dict[label] = "{:.3f}".format(predictions[self.config.label2id[label]].item())
        
        return labeled_dict

    def postprocess(self, data: dict, query):
        # data["Preprocessed"] = query # No Need
        return [data] # Return the data as is but in a list

_service = TopicClassifier()

def handle(data, context):
    try:
        if not _service.initialized:
            _service.initialize(context)
        
        if data is None:
            return None
        
        logger.log(logging.INFO, f"Received data: {data}")
        
        inputs, query = _service.preprocess(data)
        output_dict = _service.inference(inputs)
        outputs = _service.postprocess(output_dict, query)

        return outputs
    except Exception as e:
        logger.error(e)
        raise e

config.properties

inference_address=http://0.0.0.0:8085
management_address=http://0.0.0.0:8085
metrics_address=http://0.0.0.0:8082
enable_metrics_api=true
metrics_format=prometheus
number_of_netty_threads=4
job_queue_size=10
model_store=/mnt/models/model-store
models={"newmodel":{"1.0":{"defaultVersion":true,"marName":"testmodel.mar","minWorkers":1,"maxWorkers":5,"batchSize":1,"maxBatchDelay":5000,"responseTimeout":120}}}
load_models=all
install_py_dep_per_model=true

Versions

TorchServe Version is 0.6.0

Repro instructions

Run the above config with the above handler

Possible Solution

Create model_snapshot and use that instead of load_models

Issue Analytics

State:
Created a year ago
Comments:8

Top GitHub Comments

1reaction

ridhwan-saalcommented, Aug 11, 2022

@maaquib I will try to replicate your changes tomorrow to see how they go with me. But from what I’ve seen are the following differences between us:

You used --ncs where I didn’t.
You are passing models all where I’m doing that in config file
In your config you commented out models where mine is there (perhaps it’s the reason?)

I will let you know how it goes once I try it out. Thanks for your support.

1reaction

ridhwan-saalcommented, Aug 9, 2022

@maaquib it’s a huggingface model so it’s referenced by name: CAMeL-Lab/bert-base-arabic-camelbert-msa-sixteenth

Let me know if you need any more info. So far I’ve ran this on different machines (and on a Kubeflow cluster) and all of them produce the INITIALIZING TOPIC CLASSIFIER more than 20 times, which is a log I have placed in the initialize function of the handler as seen above.

Necessary Edit: The model above is actually the base model, I am then fine-tuning it using HuggingFace topic classifier on exactly 3 classes, Topic1 - Topic2- Topic3. If it were to come to it I can send you the checkpoints and the required files to build a MAR out of it, since the data used is open source and the training is just POC and nothing copyrighted/licensed yet.