question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

load_models all loading the same model 10 times before going unresponsive

See original GitHub issue

🐛 Describe the bug

This is a continuation of #1779 as that discussion took a different route.

Error logs

Logs too long, moved to pastebin

Installation instructions

docker run --rm -it -p 8085:8085 -v $(pwd):/home/model-server/ pytorch/torchserve bash

Model Packaing

Packaged using model archiver 0.6 Model Handler

from transformers import AutoConfig, AutoTokenizer, AutoModelForSequenceClassification
import anltk
import unicodedata
import torch

# Guide dependencies
from abc import ABC
import logging
import torch
from ts.torch_handler.base_handler import BaseHandler

logger = logging.getLogger(__name__)

class TopicClassifier(BaseHandler, ABC):
    def __init__(self,):
        super().__init__()
        self.initialized = False

    def initialize(self, ctx):
        logger.log(logging.INFO, "=============INITIALIZING TOPIC CLASSIFIER=============")
        self.manifest = ctx.manifest

        properties = ctx.system_properties
        model_path = properties.get("model_dir")
        self.device = torch.device("cuda:" + str(properties.get("gpu_id")) if torch.cuda.is_available() else "cpu")

        self.config = AutoConfig.from_pretrained(model_path)
        self.tokenizer = AutoTokenizer.from_pretrained(model_path, config=self.config)
        self.model = AutoModelForSequenceClassification.from_pretrained(model_path, config=self.config)

        self.model.to(self.device)
        self.model.eval()

        # Optional Mapping File
        self.labels = list(self.config.label2id.keys())


        logger.log(logging.INFO, f"Initialized Topic Classifier")
        self.initialized = True

    def preprocess_(self, query: str) -> str:
        query_ = unicodedata.normalize('NFC', query)
        query_ = ' '.join(anltk.tokenize_words(query_))
        query_ = anltk.remove_non_alpha(query_, stop_list=' ?,:".')
        query_ = anltk.fold_white_spaces(query_)
        query_ = query_.strip()
        return query_

    def preprocess(self, data):
        logger.log(logging.INFO, f"Preprocessing started")
        logger.log(logging.INFO, f"data is {data}")
        query = data[0]
        query = query.get("body", {"text": query.get("text", "")}).get("text", "")
        if not query.strip():
            raise Exception("No text found in query")
        query = self.preprocess_(query)
        logger.log(logging.INFO, f"query is {query}")

        query_ = self.preprocess_(query)

        # tokens = self.tokenizer.tokenize(query_) # Debugging only

        encoded_dict = self.tokenizer.encode_plus(
                        query_,                      # Sentence to encode.
                        add_special_tokens = True, # Add '[CLS]' and '[SEP]'
                        max_length=512,           # Pad & truncate all sentences.
                        truncation=True,
                        padding='max_length',  # Padding strategy
                        pad_to_max_length = True,
                        return_attention_mask = True,   # Construct attn. masks.
                        return_tensors = 'pt',     # Return pytorch tensors.
                    )
        # return encoded_dict, tokens, query_
        return encoded_dict, query_
    
    def inference(self, inputs,):
        logger.log(logging.INFO, f"Inference started")
        with torch.no_grad():
            for key in inputs: # Convert all to device first
                try: 
                    inputs[key] = inputs[key].to(self.device)
                except:
                    pass
            outputs = self.model(**inputs)
        
        predictions = torch.nn.functional.softmax(outputs[0].squeeze(), dim=0)
        pred = torch.argmax(predictions, dim=0)

        correct = self.labels[pred.item()]

        logger.log(logging.INFO, f"Predicted: {correct}")

        class_dict = {}
        labeled_dict = {"Correct": correct, "Classes": class_dict}
        for label in self.labels:
            class_dict[label] = "{:.3f}".format(predictions[self.config.label2id[label]].item())
        
        return labeled_dict

    def postprocess(self, data: dict, query):
        # data["Preprocessed"] = query # No Need
        return [data] # Return the data as is but in a list

_service = TopicClassifier()

def handle(data, context):
    try:
        if not _service.initialized:
            _service.initialize(context)
        
        if data is None:
            return None
        
        logger.log(logging.INFO, f"Received data: {data}")
        
        inputs, query = _service.preprocess(data)
        output_dict = _service.inference(inputs)
        outputs = _service.postprocess(output_dict, query)

        return outputs
    except Exception as e:
        logger.error(e)
        raise e

config.properties

inference_address=http://0.0.0.0:8085
management_address=http://0.0.0.0:8085
metrics_address=http://0.0.0.0:8082
enable_metrics_api=true
metrics_format=prometheus
number_of_netty_threads=4
job_queue_size=10
model_store=/mnt/models/model-store
models={"newmodel":{"1.0":{"defaultVersion":true,"marName":"testmodel.mar","minWorkers":1,"maxWorkers":5,"batchSize":1,"maxBatchDelay":5000,"responseTimeout":120}}}
load_models=all
install_py_dep_per_model=true

Versions

TorchServe Version is 0.6.0

Repro instructions

Run the above config with the above handler

Possible Solution

Create model_snapshot and use that instead of load_models

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:8

github_iconTop GitHub Comments

1reaction
ridhwan-saalcommented, Aug 11, 2022

@maaquib I will try to replicate your changes tomorrow to see how they go with me. But from what I’ve seen are the following differences between us:

  • You used --ncs where I didn’t.
  • You are passing models all where I’m doing that in config file
  • In your config you commented out models where mine is there (perhaps it’s the reason?)

I will let you know how it goes once I try it out. Thanks for your support.

1reaction
ridhwan-saalcommented, Aug 9, 2022

@maaquib it’s a huggingface model so it’s referenced by name: CAMeL-Lab/bert-base-arabic-camelbert-msa-sixteenth

Let me know if you need any more info. So far I’ve ran this on different machines (and on a Kubeflow cluster) and all of them produce the INITIALIZING TOPIC CLASSIFIER more than 20 times, which is a log I have placed in the initialize function of the handler as seen above.

Necessary Edit: The model above is actually the base model, I am then fine-tuning it using HuggingFace topic classifier on exactly 3 classes, Topic1 - Topic2- Topic3. If it were to come to it I can send you the checkpoints and the required files to build a MAR out of it, since the data used is open source and the training is just POC and nothing copyrighted/licensed yet.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Save and load models | TensorFlow Core
Model progress can be saved during and after training. This means a model can resume where it left off and avoid long training...
Read more >
How to Save and Load Your Keras Deep Learning Model
In this post, you will look at three examples of saving and loading your model to a file: Save Model to JSON; Save...
Read more >
Extremely slow model load with keras - python - Stack Overflow
I solved the problem by clearing the keras session before each load from keras import backend as K for i in range(.
Read more >
tf.loadModel not working in ionic · Issue #272 · tensorflow/tfjs
loadModel not working, it fails to load model from local folder ( ie. assets/model ) however, web version is working, when it runs...
Read more >
A quick complete tutorial to save and restore Tensorflow models
How to restore a Tensorflow model for prediction/transfer learning? ... you want to keep only 4 latest models and want to save one...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found