Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

the speed of prediction get much slower after its deployment on the server

See original GitHub issue

My model is converted from Keras. The speed is normal, less than 1s when I test it on localhost. But after I deployed it on my remote Ubuntu server using tomcat(https), the speed of prediction is much slower than localhost, nearly 10s.

    const tensor = tf.tensor(data,[1,93],"int32");
    let value = sentiment_predict(tensor);

    async function sentiment_predict(tensor){
        const model = await tf.loadLayersModel(MODEL_URL);
        let result = model.predict(tensor).toString();
        let num = result.split('[')[2].split(',');
        let neg = parseFloat(num[0]);
        let pos = parseFloat(num[1]);
        return pos - neg;
    }

Maybe because the size of model (6.3M) ? Maybe because I load the model every time I want to predict? So why that and what should I do to? I’ll appreciate it if you could help me.

Issue Analytics

State:
Created 4 years ago
Comments:6

Top GitHub Comments

1reaction

adwelljcommented, Apr 4, 2019

Sounds like you might be trying to make a prediction before the model has finished loading.

1reaction

Swockycommented, Apr 4, 2019

@Swocky, I believe that model is going to be the promise for loadLayersModel() instead of the actual model. It likely works in your first example because of the await statement.

Yeah, you are right. And I try this way, the problem has basically been solved. Thanks.

var model;
    loadModel();
    async function loadModel() {
        model = await tf.loadLayersModel(MODEL_URL);
    }

    async function sentiment_predict(tensor){
        let result = model.predict(tensor).toString();
        let num = result.split('[')[2].split(',');
        let neg = parseFloat(num[0]);
        let pos = parseFloat(num[1]);
        return pos - neg;
    }

Top Results From Across the Web

Minimizing real-time prediction serving latency in machine ...

An ML model is useful only if it's deployed and ready to make predictions, but building an adapted ML serving system requires the...

Why does keras model predict slower after compile?

Yes, both are possible, and it will depend on (1) data size; (2) model size; (3) hardware. Code at the bottom actually shows...

Optimizing Models for Deployment and Inference - neptune.ai

Batches of images of products could be taken simultaneously to the prediction, and thus sequentially inferencing each image would actually slow ...

There are two very different ways to deploy ML models, here's ...

It can make calls to a backend server to get results, which it then maybe ... and storing models or predictions to the...

Performance Guide | TFX - TensorFlow

This is due to a better potential for multi-tenant deployment to utilize the hardware and lower fixed costs (RPC server, TensorFlow runtime, etc ......