question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Very slow inference in 0.5.11

See original GitHub issue

After training a default classifier, saving and loading. model.predict("lorem ipsum") and model.predict_prob take in average 14 seconds even on a hefty server such as AWS p3.16Xlarge.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:17 (17 by maintainers)

github_iconTop GitHub Comments

3reactions
dimiddcommented, Nov 29, 2018

Thanks, for my use case (serving a model as an api), a contextmanager doesn’t fit, since I need to call predict after an external event (e.g. an http request), so I’m just calling _cached_inference directly. Anyhow, I think we can finally close this issue. Thanks a lot for your great work!

1reaction
madisonmaycommented, Nov 22, 2018

Hi @dimidd,

Thanks for checking back in! Although I was hoping to end up with a solution where we could have our metaphorical cake and eat it too, we ran into some limitations with how tensorflow handles cleaning up memory that meant we had to opt for a more explicit interface for prediction if you want to avoid rebuilding the graph: https://finetune.indico.io/#prediction

model = Classifier()
model.fit(train_data, train_labels)
with model.cached_predict():
    model.predict(test_data) # triggers prediction graph construction
    model.predict(test_data) # graph is already cached, so subsequence calls are faster

Let me know if this solution works for you!

Read more comments on GitHub >

github_iconTop Results From Across the Web

ONNX Inference Speed extremely slow compare to .pt Model
Hi, I tried to inference an image of resolution 1024*1536 using onnx and .pt model As you can see the huge time difference...
Read more >
On-device inference is slow - ideas on how to speed it up?
My suspicion is that a simpler architecture (while it might lead to lower accuracy) could lead to faster predictions. I am targeting something ......
Read more >
TF-TRT model very slow to load, with poor performance
I have 2 issues: it takes about 25min to get the model ready to run in the inference script, I'd like to have...
Read more >
Improving Inference Speeds of Transformer Models - Medium
“With great models comes slower inference speeds”. Deep Learning has evolved immensely and it has Transforme(r)d NLP completely in the past ...
Read more >
TF.Keras model.predict is slower than straight Numpy?
It is true that numpy doesn't operate on GPU, so unlike tf-gpu , it doesn't encounter any data shifting overhead. But also it's...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found