Very slow inference in 0.5.11
See original GitHub issueAfter training a default classifier, saving and loading.
model.predict("lorem ipsum") and model.predict_prob take in average 14 seconds even on a hefty server such as AWS p3.16Xlarge.
Issue Analytics
- State:
- Created 5 years ago
- Comments:17 (17 by maintainers)
Top Results From Across the Web
ONNX Inference Speed extremely slow compare to .pt Model
Hi, I tried to inference an image of resolution 1024*1536 using onnx and .pt model As you can see the huge time difference...
Read more >On-device inference is slow - ideas on how to speed it up?
My suspicion is that a simpler architecture (while it might lead to lower accuracy) could lead to faster predictions. I am targeting something ......
Read more >TF-TRT model very slow to load, with poor performance
I have 2 issues: it takes about 25min to get the model ready to run in the inference script, I'd like to have...
Read more >Improving Inference Speeds of Transformer Models - Medium
“With great models comes slower inference speeds”. Deep Learning has evolved immensely and it has Transforme(r)d NLP completely in the past ...
Read more >TF.Keras model.predict is slower than straight Numpy?
It is true that numpy doesn't operate on GPU, so unlike tf-gpu , it doesn't encounter any data shifting overhead. But also it's...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

Thanks, for my use case (serving a model as an api), a
contextmanagerdoesn’t fit, since I need to call predict after an external event (e.g. an http request), so I’m just calling_cached_inferencedirectly. Anyhow, I think we can finally close this issue. Thanks a lot for your great work!Hi @dimidd,
Thanks for checking back in! Although I was hoping to end up with a solution where we could have our metaphorical cake and eat it too, we ran into some limitations with how tensorflow handles cleaning up memory that meant we had to opt for a more explicit interface for prediction if you want to avoid rebuilding the graph: https://finetune.indico.io/#prediction
Let me know if this solution works for you!