Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How do you do inference in production?

See original GitHub issue

❓ Questions & Help

Details

I was wondering how do you guys do inference in production? I tried to convert this model to tensorflow model but failed.

This is what I tried:

tf_model = TFGPT2LMHeadModel.from_pretrained("tmp/", from_pt=True)
tf.saved_model.save(tf_model,"tmp/saved")
loaded = tf.saved_model.load("tmp/saved")
print(list(loaded.signatures.keys()))

And it returns an empty list

A link to original question on Stack Overflow: https://stackoverflow.com/questions/52826134/keras-model-subclassing-examples

Issue Analytics

State:
Created 4 years ago
Comments:32 (13 by maintainers)

Top GitHub Comments

4reactions

jplucommented, Jun 6, 2020

If trainer is just used for training, why in run_tf_ner.py line 246, there is a prediction done with the trainer:

This part is only here to evaluate the model and output the predictions on the test set into a file and not for inference in production. It is two distinct cases.

If I set the mode to prediction, initialize the trainer with a nonsense output_dir, replace test_dataset.get_dataset(), with my own data, I can actually get the predictions. I guess it is initiated through checkpoints dir.

Yes, it is normal because the predict is just here to evaluate your model on a dataset, and it is not initatied from the checkpoint dir but from the .h5 file in your model folder only.

If I use the code discussed in this post to save and load the model, the saved model can convert the sentence to features, but it cannot do any prediction; the loaded model would not convert the sentence to features.

This is normal because your input doesn’t correspond to the signature. The big picture is that from the loaded_model(...) line you don’t get features, you get the real output of the model, this is what does a saved model. A tensor of values for each token where each value is the prob of the corresponding label.

Hence once you get your saved model, run the command:

tensorflow_model_server \
  --rest_api_port=8501 \
  --model_name=ner \
  --model_base_path="tf2_0606_german" >server.log 2>&1

Now, you have an API that wraps your model. Finally, in a Python script you can do:

import json
import numpy
import requests
my_features = # call here the tokenizer
data = json.dumps({"signature_name": "serving_default",
                   "instances": my_features})
headers = {"content-type": "application/json"}
json_response = requests.post('http://localhost:8501/v1/models/ner:predict',
                              data=data, headers=headers)
predictions = numpy.array(json.loads(json_response.text)["predictions"])

Finally, you get your predictions and you have to code the translation preds -> text.

I also found one more complication. The code you showed works only for sentences containing three words or less. If “it is me” is changed to “it is me again”, the code will return the same argument error message I mentioned in the last response.

This is totally normal, as I told you, you have to code your own signature as it is showed in the TF documentation that I linked you in my previous post.

For now, nothing is implemented in the transformers lib to do what you are looking for with a saved model. It means that, to do inference in production with a saved model you have to code all the logic I explained above by yourself. It is planned to integrate this part in a near future, it is even an ongoing work, but far to be finished.

0reactions

rishabhjha33commented, Apr 8, 2021

from transformers import TFAutoModelForTokenClassification, BertTokenizer, TFBertForTokenClassification
import tensorflow as tf

output_dir = "model"
saved_model_dir = "tf2_0606_german"

model = TFAutoModelForTokenClassification.from_pretrained(output_dir)
tf.saved_model.save(model, saved_model_dir)
loaded_nodel = tf.saved_model.load(saved_model_dir)

tokenizer = BertTokenizer.from_pretrained("bert-base-multilingual-cased")
sentence = "1951 bis 1953 wurde der nördliche Teil als Jugendburg des Kolpingwerkes gebaut ."
features = {"input_ids": tokenizer.encode(sentence, add_special_tokens=True, return_tensors="tf")}

print(model(features, training=False))
print(loaded_model(features, training=False))

Error message can be found https://colab.research.google.com/drive/1uPCpR31U5VRMT3dArGyDK9WT6hKQa0bv?usp=sharing#scrollTo=SBCchEi-qlnA

@jx669 Were you able to solve the error in the cell in this notebook print(loaded_model(features, training=False)) #not working where it asks for the input_ids to be of shape (None, 5) ? Have been facing the exact same issue and no clue how to solve this.

Top Results From Across the Web

Deploying Machine Learning models to production - Medium

Deploying the model in order to perform inference means that you've trained a model, tested its performance , and decided to use it...

Deploying a Model for Inference at Production Scale | NVIDIA

Sending asynchronous requests to maximize throughput. Upon completion, learners will be able to deploy their own machine learning models on a GPU server....

Machine learning inference during deployment - Microsoft Learn

When deploying your AI model during production, you need to consider how it will make predictions. The two main processes for AI models...

How to put machine learning models into production

Are you getting inference data from webpages? Are you receiving prediction requests from APIs? Are you making batch or real-time predictions?

Understanding Machine Learning Inference - Run:AI

Machine learning inference—involves putting the model to work on live data to produce an actionable output. During this phase, the inference system accepts ......