How do you do inference in production?
See original GitHub issue❓ Questions & Help
Details
I was wondering how do you guys do inference in production? I tried to convert this model to tensorflow model but failed.
This is what I tried:
tf_model = TFGPT2LMHeadModel.from_pretrained("tmp/", from_pt=True)
tf.saved_model.save(tf_model,"tmp/saved")
loaded = tf.saved_model.load("tmp/saved")
print(list(loaded.signatures.keys()))
And it returns an empty list
A link to original question on Stack Overflow: https://stackoverflow.com/questions/52826134/keras-model-subclassing-examples
Issue Analytics
- State:
- Created 4 years ago
- Comments:32 (13 by maintainers)
Top Results From Across the Web
Deploying Machine Learning models to production - Medium
Deploying the model in order to perform inference means that you've trained a model, tested its performance , and decided to use it...
Read more >Deploying a Model for Inference at Production Scale | NVIDIA
Sending asynchronous requests to maximize throughput. Upon completion, learners will be able to deploy their own machine learning models on a GPU server....
Read more >Machine learning inference during deployment - Microsoft Learn
When deploying your AI model during production, you need to consider how it will make predictions. The two main processes for AI models...
Read more >How to put machine learning models into production
Are you getting inference data from webpages? Are you receiving prediction requests from APIs? Are you making batch or real-time predictions?
Read more >Understanding Machine Learning Inference - Run:AI
Machine learning inference—involves putting the model to work on live data to produce an actionable output. During this phase, the inference system accepts ......
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
This part is only here to evaluate the model and output the predictions on the test set into a file and not for inference in production. It is two distinct cases.
Yes, it is normal because the predict is just here to evaluate your model on a dataset, and it is not initatied from the checkpoint dir but from the
.h5
file in your model folder only.This is normal because your input doesn’t correspond to the signature. The big picture is that from the
loaded_model(...)
line you don’t get features, you get the real output of the model, this is what does a saved model. A tensor of values for each token where each value is the prob of the corresponding label.Hence once you get your saved model, run the command:
Now, you have an API that wraps your model. Finally, in a Python script you can do:
Finally, you get your predictions and you have to code the translation preds -> text.
This is totally normal, as I told you, you have to code your own signature as it is showed in the TF documentation that I linked you in my previous post.
For now, nothing is implemented in the
transformers
lib to do what you are looking for with a saved model. It means that, to do inference in production with a saved model you have to code all the logic I explained above by yourself. It is planned to integrate this part in a near future, it is even an ongoing work, but far to be finished.@jx669 Were you able to solve the error in the cell in this notebook
print(loaded_model(features, training=False)) #not working
where it asks for theinput_ids
to be of shape(None, 5)
? Have been facing the exact same issue and no clue how to solve this.