question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

the speed of prediction get much slower after its deployment on the server

See original GitHub issue

My model is converted from Keras. The speed is normal, less than 1s when I test it on localhost. But after I deployed it on my remote Ubuntu server using tomcat(https), the speed of prediction is much slower than localhost, nearly 10s.

    const tensor = tf.tensor(data,[1,93],"int32");
    let value = sentiment_predict(tensor);

    async function sentiment_predict(tensor){
        const model = await tf.loadLayersModel(MODEL_URL);
        let result = model.predict(tensor).toString();
        let num = result.split('[')[2].split(',');
        let neg = parseFloat(num[0]);
        let pos = parseFloat(num[1]);
        return pos - neg;
    }

Maybe because the size of model (6.3M) ? Maybe because I load the model every time I want to predict? So why that and what should I do to? I’ll appreciate it if you could help me.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6

github_iconTop GitHub Comments

1reaction
adwelljcommented, Apr 4, 2019

Sounds like you might be trying to make a prediction before the model has finished loading.

1reaction
Swockycommented, Apr 4, 2019

@Swocky, I believe that model is going to be the promise for loadLayersModel() instead of the actual model. It likely works in your first example because of the await statement.

Yeah, you are right. And I try this way, the problem has basically been solved. Thanks.

var model;
    loadModel();
    async function loadModel() {
        model = await tf.loadLayersModel(MODEL_URL);
    }

    async function sentiment_predict(tensor){
        let result = model.predict(tensor).toString();
        let num = result.split('[')[2].split(',');
        let neg = parseFloat(num[0]);
        let pos = parseFloat(num[1]);
        return pos - neg;
    }
Read more comments on GitHub >

github_iconTop Results From Across the Web

Minimizing real-time prediction serving latency in machine ...
An ML model is useful only if it's deployed and ready to make predictions, but building an adapted ML serving system requires the...
Read more >
Why does keras model predict slower after compile?
Yes, both are possible, and it will depend on (1) data size; (2) model size; (3) hardware. Code at the bottom actually shows...
Read more >
Optimizing Models for Deployment and Inference - neptune.ai
Batches of images of products could be taken simultaneously to the prediction, and thus sequentially inferencing each image would actually slow ...
Read more >
There are two very different ways to deploy ML models, here's ...
It can make calls to a backend server to get results, which it then maybe ... and storing models or predictions to the...
Read more >
Performance Guide | TFX - TensorFlow
This is due to a better potential for multi-tenant deployment to utilize the hardware and lower fixed costs (RPC server, TensorFlow runtime, etc ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found